Understanding Your Algorithm Project
Project Directory
After creating the project from Cookiecutter (with your_first_algorithm_plugin
as an example), the project directory will look something like this:
├── AUTHORS.rst
├── CHANGELOG.md
├── LICENSE
├── README.md
├── __main__.py
├── input.schema.json
├── output.schema.json
├── plugin.meta.json
├── requirements.txt
├── syntax_checker.py
├── tests
│ ├── plugin_test.py
│ └── user_defined_files
│ ├── data
│ │ ├── pickle_pandas_mock_binary_classification_credit_risk_testing.sav
│ │ ├── pickle_pandas_mock_binary_classification_pipeline_credit_risk_testing.sav
│ │ ├── pickle_pandas_mock_binary_classification_pipeline_credit_risk_ytest.sav
│ │ ├── pickle_pandas_mock_multiclass_classification_pipeline_toxic_classification_testing.sav
│ │ ├── pickle_pandas_mock_multiclass_classification_pipeline_toxic_classification_ytest.sav
│ │ ├── pickle_pandas_mock_multiclass_classification_toxic_classification_testing.sav
│ │ ├── pickle_pandas_mock_regression_donation_testing.sav
│ │ ├── pickle_pandas_mock_regression_pipeline_testing.sav
│ │ ├── pickle_pandas_mock_regression_pipeline_ytest.sav
│ │ └── raw_fashion_image_10
│ │ ├── 0.png
│ │ ├── 1.png
│ │ ├── 2.png
│ │ ├── 3.png
│ │ ├── 4.png
│ │ ├── 5.png
│ │ ├── 6.png
│ │ ├── 7.png
│ │ ├── 8.png
│ │ └── 9.png
│ ├── model
│ │ ├── binary_classification_mock_credit_risk_sklearn.linear_model._logistic.LogisticRegression.sav
│ │ ├── multiclass_classification_mock_toxic_classification_sklearn.linear_model._logistic.LogisticRegression.sav
│ │ └── regression_mock_donation_sklearn.linear_model._base.LinearRegression.sav
│ └── pipeline
│ ├── binary_classification_tabular_credit_loan
│ │ ├── binary_classification_pipeline_credit_risk_sklearn.pipeline.Pipeline.sav
│ │ └── creditCustomClass.py
│ ├── multiclass_classification_image_mnist_fashion
│ │ ├── fashionCustomClass.py
│ │ └── fashion_mnist_lr_pipeline.sav
│ ├── multiclass_classification_tabular_toxic_classification
│ │ ├── multiclass_classification_pipeline_toxic_classification_sklearn.pipeline.Pipeline.sav
│ │ └── toxicCustomClass.py
│ └── regression_tabular_donation
│ ├── regressionCustomClass.py
│ └── regression_pipeline_donation_sklearn.pipeline.Pipeline.sav
├── your_first_algorithm_plugin.meta.json
└── your_first_algorithm_plugin.py
Files in The Project
AUTHORS.rst
The name or organisation name of the algorithm developer.CHANGELOG.md
A log of all notable changes made to this project.LICENSE
The license of this algorithm.README.md
A default page which is shown on the code repository. It contains the description, license, plugin URL and developers.__main__.py
The file with a main function which serves as an entry point for testing.input.schema.json
The input schema of the algorithm. It is used to validate against the user's input when running the algorithm.output.schema.json
The output schema of the algorithm. It is used to validate against the algorithm's generated result.plugin.meta.json
The metadata of the algorithm. It contains the gid, version, name, author, description and project url.requirements.txt
A list of required Python packages required for this plugin.syntax_checker.py
A Python script which checks for syntax errors in the main fileyour_first_algorithm_plugin.py
.tests/plugin_test.py
The file with all the testing logic of the algorithm plugin. It is called by__main__.py
.tests/user_defined_files
A directory for the user to place all the test files required for the algorithm. Test files can include sample data and model read in by the algorithm.your_first_algorithm_plugin.meta.json
The metadata of the type of algorithm, which also serves as a configuration file to manage the files to include for deployment. It contains the cid, name, model type, version, description, tags, whether or not it requires ground truth and the required files for deployment.your_first_algorithm_plugin.py
The file with all the logic of the algorithm. Most, if not all the codes should reside in this file.
Understanding the Files You Need To Modify
While there are many files included in this project, you will only need to focus on modifying a few files. There are TODO
comments in each of these files to guide you on the things you have to modify (please remove the TODO
comments when you have modified the required parts). Here are the files:
__main__.py
The entry point when testing your algorithm. When you run python .
, you will run this file, which will call the test file. You will need to update the paths to the data and the input arguments in this file.
Example:
core_modules_path
: The absolute or relative path (from__main__.py
) of the test-engine-core-modules path. This can be left empty and it will default to../../../../test-engine-core-modules
data_path
: The absolute or relative path (from__main__.py
) of the test data filemodel_path
: The absolute or relative path (from__main__.py
) of the test model fileground_truth_path
(optional): The absolute or relative path (from__main__.py
) of the ground truth data fileground_truth
(optional): The field name(string
) of the ground truth
Note
Ground truth is optional so if your algorithm does not require ground truth, ground_truth_path
and ground_truth
can be left as an empty string ""
.
plugin_argument_values
: A dictionary of input arguments. In the example above, the input argumentsensitive_feature
is anarray
ofstring
. The input argument(s) and their type(s) must match the schema ininput.schema.json
.
your_first_algorithm_plugin.py
Note
This section uses your_first_algorithm_plugin
as a sample project. If, for instance, your project is named your_second_algorithm_plugin
, this file would be named your_second_algorithm_plugin.py
instead.
This file is the heart of the algorithm plugin where the magic happens. Most, if not all the codes will be in this file.
Plugin Description
The following points should be considered when writing the plugin description:
- Document the purpose of this plugin.
- What does this plugin do in general?
- Are there any limitations for this plugin?
-
Is there anything else that future developers should note or understand?
Example:""" # TODO: Update the plugin description below The Plugin({{cookiecutter.plugin_name}}) class specifies methods in generating results for algorithm """ # Some information on plugin _name: str = "Partial Dependence Plot" _description: str = ( "A Partial Dependence Plot (PDP) explains how each feature and its feature value " "contribute to the predictions." ) _version: str = "0.1.0" _metadata: PluginMetadata = PluginMetadata(_name, _description, _version) _plugin_type: PluginType = PluginType.ALGORITHM _requires_ground_truth: bool = False
Input Schema
There is no need to update anything in this file. This is a reminder to update the `input.schema.json`.
Output Schema
There is no need to update anything in this file. This is a reminder to update the `output.schema.json`.
Main Codes of the Algorithm
The generate()
method is where your codes will be inserted. When the main file __main__.py
is run, it will create an instance of PluginTest()
and call its method run()
, which will call this method generate()
. As such, your codes will be in either in generate()
or another method that generate()
calls, like _explain_pdp()
in the example below:
def generate(self) -> None:
"""
A method to generate the algorithm results with the provided data, model, ground truth information.
"""
# Retrieve data information
self._data = self._data_instance.get_data()
# Perform pdp explanation
self._explain_pdp()
# Update progress (For 100% completion)
self._progress_inst.update(1)
def _explain_pdp(self) -> None:
# main codes
...
...
self._results = your_algo_output_results
Note
Regardless of where your algorithm codes are placed, the final output of the algorithm must be assigned to self._results
. The final output will be used to match against the schema defined in output.schema.json
.
input.schema.json
Specifies the schema for the input. This is used to validate the schema of the user's input.
Example:
{
"title": "Algorithm Plugin Input Arguments",
"description": "A schema for algorithm plugin input arguments",
"type": "object",
"required": [
"sensitive_feature"
],
"properties": {
"sensitive_feature": {
"title": "Sensitive Feature Names",
"description": "Array of Sensitive Feature Names (e.g. Gender)",
"type": "array",
"items": {
"type": "string"
},
"minItems": 1
}
}
}
title
: The title of this input schema filedescription
: The description of this input schema filetype
: Input type of argument. It should beobject
by defaultrequired
: Field(s) which must be present. Add the name of the required field(s) into the list (i.e.required: [required_feature_one, ... ,required_feature_n]
)properties
: Contains the details of therequired
field(s). Everyrequired
field must be included and contain the following details:title
: Name of the required fielddescription
: A brief description of the field with some sampletype
: The type of the required field. It can bearray
,string
,number
, etc- If the
type
isarray
, it must also contain a nested list nameditems
, which contains thetype
of the element in thearray
(refer topercentiles
in the example). You can include multiple types in theitems
list if you allow multiple types for theitems
(i.e."items": {"type": "number", "type": "string"}
)
output.schema.json
Specifies the schema for the output. This is used to validate the schema of the algorithm's output.
Example:
{
"title":"Algorithm Plugin Output Arguments",
"description":"A schema for algorithm plugin output arguments",
"type":"object",
"required":[
"feature_names",
"results"
],
"properties":{
"feature_names":{
"type":"array",
"description":"Array of feature names",
"minItems":1,
"items":{
"type":"string"
}
},
"output_classes":{
"description":"Array of output classes",
"type":"array",
"minItems":1,
"items":{
"type":[
"string",
"number",
"integer",
"boolean"
]
}
},
"results":{
"description":"Matrix of feature values (# feature names)",
"type":"array",
"minItems":1,
"items":{
"description":"Matrix of PDP plot data (# output classes)",
"type":"array",
"minItems":1,
"items":{
"type":"array",
"description":"Array of PDP values for each feature value (# feature values)",
"minItems":1,
"items":{
"type":"object",
"description":"Array of feature and PDP value",
"required":[
"feature_value",
"pdp_value"
],
"properties":{
"feature_value":{
"type":"number"
},
"pdp_value":{
"type":"number"
}
}
}
}
}
}
}
}
title
: The title of this output schema filedescription
: The description of this output schema filetype
: Input type of argument. It should beobject
by defaultrequired
: Field(s) which must be present. Add the name of the required field(s) into the list (i.e.required: [required_feature_one, ... ,required_feature_n]
)properties
: Contains the details of therequired
field(s). Everyrequired
field must be included and contain the following details:description
: A brief description of the field with some sampletype
: The type of the required field. It can bearray
,string
,number
, etc- If the
type
isarray
, it must also contain a nested list nameditems
, which contains thetype
of the element in thearray
(refer tooutput_classes
in the example). You can include multiple types in theitems
list if you allow multiple types for theitems
(i.e."items": {"type": "number", "type": "string"}
)
your_first_algorithm_plugin.meta.json
The metadata of the algorithm plugin. This file should be autogenerated by Cookiecutter according to the your input during the creation phase.
Example:
{
"cid": "partial_dependence_plot",
"name": "Partial Dependence Plot",
"modelType": [
"classification",
"regression"
],
"version": "0.9.0",
"author": "AI Verify",
"description": "A Partial Dependence Plot (PDP) explains how each feature and its feature value contribute to the predictions.",
"tags": [
"Partial Dependence Plot",
"classification",
"regression"
],
"requireGroundTruth": false,
"requiredFiles": [
"AUTHORS.rst",
"CHANGELOG.md",
"input.schema.json",
"LICENSE",
"output.schema.json",
"partial_dependence_plot.meta.json",
"partial_dependence_plot.py",
"README.md",
"requirements.txt",
"syntax_checker.py",
"my_additional_python_files_dir",
"my_custom_python_file.py"
]
}
cid
: The component name of the algorithmname
: The name of this algorithm pluginmodelType
: The type(s) of the algorithm model. It can be eitherclassification
,regression
or bothversion
: The version of this algorithm. It defaults to0.1.0
. If this algorithm is an improvement of a previous algorithm, you should increase the version accordingly. Refer to Understanding Versioning for more informationauthor
: The name of the developer or the developer's organisationdescription
: A short description on what the algorithm doestags
: A list of searchable tag(s) for the algorithm (i.e. you can addclassification
to this list if the algorithm supports it)requiresGroundTruth
: A boolean value to determine if this algorithm requires ground truth datarequiredFiles
: A list of required files for the algorithm to run. If you have other required file(s) (currently we only allow.py
files), add the file name into this list- If the
.py
file(s) are in a directory, you can add the directory into the list. The directory will be recursively traversed and all the discovered.py
files will be added, with the directory hierarchy preserved- For example,
my_additional_python_files_dir
andmy_custom_python_file.py
are additional required directory and Python file added in by the user
Note
Do not remove or edit the required files already in the list
- For example,
- If the
requirements.txt
Python requirements file is used to keep track of the Python packages used by this algorithm plugin. It simplifies the installation of all required packages and makes it easy to share your project with others. Example:
numpy==1.24.2 ; python_version >= "3.10" and python_version < "3.12"
scipy==1.10.1 ; python_version >= "3.10" and python_version < "3.12"
Example of how to generate requirements.txt:
1. Using pip or pip3 to generate requirements.txt:
bash
poetry export --without-hashes --format=requirements.txt > requirements.txt