Understanding Your Algorithm Project
Project Directory
After creating the project from Cookiecutter (with your_first_algorithm_plugin
as an example), the project directory will look something like this:
├── AUTHORS.rst
├── CHANGELOG.md
├── README.md
├── run_tests.sh
├── syntax_checker.py
├── pyproject.toml
├── my_algorithm
| ├── __main__.py
| ├── __init__.py
| ├── algo.meta.json
| ├── algo.py
| ├── algo_init.py
| ├── plugin_init.py
| ├── input.schema.json
| ├── output.schema.json
├── tests
│ ├── __init__.py
│ ├── e2e
| | ├── test_e2e.py
│ ├── unit_tests
| | ├── test_algo.py
The Key Files in the Project
AUTHORS.rst
The name or organisation name of the algorithm developer.CHANGELOG.md
A log of all notable changes made to this project.README.md
A default page which is shown on the code repository. It contains the description, license, plugin URL and developers.__main__.py
Entry point to call the algorithm to be called from command line.algo.meta.json
The metadata of the type of algorithm, which also serves as a configuration file to manage the files to include for deployment. It contains the cid, name, model type, version, description, tags, whether or not it requires ground truth and the required files for deployment.algo.py
The file with all the logic of the algorithm. Most, if not all the codes should reside in this file.input.schema.json
The input schema of the algorithm. It is used to validate against the user's input when running the algorithm.output.schema.json
The output schema of the algorithm. It is used to validate against the algorithm's generated result.pyproject.toml
Configuration file used by packaging tool.syntax_checker.py
A Python script which checks for syntax errors in the main filealgo.py
.tests/
The test folder containing the e2e and unit tests. It should be run usingpytest .
Understanding the Files You Need To Modify
While there are many files included in this project, you will only need to focus on modifying a few files. There are TODO
comments in each of these files to guide you on the things you have to modify (please remove the TODO
comments when you have modified the required parts). Here are the files:
algo.py
This file is the heart of the algorithm plugin where the magic happens. Most, if not all the codes will be in this file.
Plugin Description
The following points should be considered when writing the plugin description:
- Document the purpose of this plugin.
- What does this plugin do in general?
- Are there any limitations for this plugin?
- Is there anything else that future developers should note or understand?
Example:
class Plugin(IAlgorithm):
"""
# TODO: Update the plugin description below
The Plugin(My Algorithm) class specifies methods in generating results for algorithm
"""
# Some information on plugin
_name: str = "My Algorithm"
_description: str = "This algorithm returns the value of the feature name selected by the user."
_version: str = "0.1.0"
_metadata: PluginMetadata = PluginMetadata(_name, _description, _version)
_plugin_type: PluginType = PluginType.ALGORITHM
_requires_ground_truth: bool = True
_supported_algorithm_model_type: List = [ModelType.CLASSIFICATION]
Main Codes of the Algorithm
The generate()
method is where your codes will be inserted. When the main file __main__.py
is run, it will create an instance of PluginTest()
and call its method run()
, which will call this method generate()
.
def generate(self) -> None:
"""
A method to generate the algorithm results with the provided data, model, ground truth information.
"""
# Retrieve data information
self._data = self._data_instance.get_data()
# TODO: Insert algorithm logic for this plug-in.
# Retrieve the input arguments
my_user_defined_feature_name = self._input_arguments['feature_name']
# Get the values of the feature name and convert to a list.
self._results = {
"my_expected_results": list(self._data[my_user_defined_feature_name].values)
}
# Update progress (For 100% completion)
self._progress_inst.update(1)
Note
The final output of the algorithm must be assigned to self._results
. The output will be used to match against the schema defined in output.schema.json
.
Note
Use self._progress_inst
to update the progress of the test if the progress data is available.
input.schema.json
Specifies the schema for the input. This is used to validate the schema of the user's input.
Example:
{
"title": "Algorithm Plugin Input Arguments",
"description": "A schema for algorithm plugin input arguments",
"type": "object",
"required": [
"feature_name"
],
"properties": {
"feature_name": {
"title": "Feature Name",
"description": "Indicate the feature name to be extracted from the data file",
"type": "string"
}
}
}
title
: The title of this input schema filedescription
: The description of this input schema filetype
: Input type of argument. It should beobject
by defaultrequired
: Field(s) which must be present. Add the name of the required field(s) into the list (i.e.required: [required_feature_one, ... ,required_feature_n]
)properties
: Contains the details of therequired
field(s). Everyrequired
field must be included and contain the following details:title
: Name of the required fielddescription
: A brief description of the field with some sampletype
: The type of the required field. It can bearray
,string
,number
, etc- If the
type
isarray
, it must also contain a nested list nameditems
, which contains thetype
of the element in thearray
(refer topercentiles
in the example). You can include multiple types in theitems
list if you allow multiple types for theitems
(i.e."items": {"type": "number", "type": "string"}
)
Note
The input.schema.json
defines the algorithm specific input arguments. Besides the input arguments, all algorithms require the data_path
, model_path
, model_type
and optionally ground_truth_path
and ground_truth
to be provided as inputs when executing the algorithm.
output.schema.json
Specifies the schema for the output. This is used to validate the schema of the algorithm's output.
Example:
{
"title": "Algorithm Plugin Output Arguments",
"description": "A schema for algorithm plugin output arguments",
"type": "object",
"required": ["my_expected_results"],
"minProperties": 1,
"properties": {
"my_expected_results": {
"description": "Algorithm Output",
"type": "array",
"minItems": 10,
"items": {"type": "number"}
}
}
}
title
: The title of this output schema filedescription
: The description of this output schema filetype
: Input type of argument. It should beobject
by defaultrequired
: Field(s) which must be present. Add the name of the required field(s) into the list (i.e.required: [required_feature_one, ... ,required_feature_n]
)properties
: Contains the details of therequired
field(s). Everyrequired
field must be included and contain the following details:description
: A brief description of the field with some sampletype
: The type of the required field. It can bearray
,string
,number
, etc- If the
type
isarray
, it must also contain a nested list nameditems
, which contains thetype
of the element in thearray
(refer tooutput_classes
in the example). You can include multiple types in theitems
list if you allow multiple types for theitems
(i.e."items": {"type": "number", "type": "string"}
)
algo.meta.json
The metadata of the algorithm plugin. This file should be autogenerated by Cookiecutter according to the your input during the creation phase.
Example:
{
"cid": "my_algorithm",
"gid": "my_plugin",
"name": "My Algorithm",
"modelType": [
"classification"
],
"version": "0.1.0",
"author": "Example Author",
"description": "This algorithm returns the value of the feature name selected by the user.",
"tags": [
"My Algorithm",
"classification"
],
"requireGroundTruth": true,
"requiredFiles": [
"AUTHORS.rst",
"CHANGELOG.md",
"pyproject.toml",
"LICENSE",
"my_algorithm",
"README.md",
"requirements.txt",
"syntax_checker.py"
]
}
cid
: The component ID of the algorithmgid
: The plugin GIDname
: The name of this algorithm pluginmodelType
: The type(s) of the algorithm model. It can be eitherclassification
,regression
or bothversion
: The version of this algorithm. It defaults to0.1.0
. If this algorithm is an improvement of a previous algorithm, you should increase the version accordingly. Refer to Understanding Versioning for more informationauthor
: The name of the developer or the developer's organisationdescription
: A short description on what the algorithm doestags
: A list of searchable tag(s) for the algorithm (i.e. you can addclassification
to this list if the algorithm supports it)requiresGroundTruth
: A boolean value to determine if this algorithm requires ground truth data-
requiredFiles
: A list of required files for the algorithm to run. If you have other required file(s) and directories, add the file name into this listNote
Do not remove or edit the required files already in the list
pyproject.toml
pyproject.toml
is a configuration file used in Python projects to specify project metadata, dependencies, build system requirements, and other settings in a standardized way. It is part of PEP 518 and is supported by modern Python packaging tools such as pip, setuptools, and Poetry.
Note
By default, only aiverify-test-engine[all]
is listed as dependencies for the algorithm project. You should update this file and add in any additional dependencies you require.