AIVT 2.0 Features
Process Checklist
The AI Verify Toolkit offers multiple starting points based on your needs:
- Report-focused approach: Begin with your reporting goals in mind, make use of existing templates or customizable report canvas and design pages using widgets.
- Process-oriented approach: Complete the necessary process checklists
- Technical assessment: Run tests via Command Line Interface (CLI) or User Interface (Portal). The generated test results can be used across reports.
To help companies align their reports with the AI Verify framework, the toolkit also comes with a set of reporting templates, which pre-defines the report layout, technical tests and process checks needed.
The AI Verify Toolkit supports the AI Verify Testing Framework by providing an integrated interface that helps you to track the completion progress of the 85 testable criteria over the 11 Process Checklists, and generating a summary of how the AI system aligns with the AI Verify Testing Framework. Refer to detailed guide on AI Verify process checklists or Veritas Process checklists for additional information.
Technical Test
The AI Verify Toolkit conducts black-box testing on AI models (tabular and image models) by ingesting the AI model to be tested in the form of a serialized model file/folder. Depending on the test to run, various dataset files and test arguments will be needed. The AI Verify report templates contains technical tests that covers 3 principles:
Fairness | Explainability | Robustness | |
---|---|---|---|
Algorithms | - Fairness for Regression - Fairness for Classification - Veritas fairness and transparency assessment |
- Accumulated Local Effect - Partial Dependence Plot - SHAP Toolbox |
- Robustness Toolbox - Image Corruption Toolbox |
Metrics & Methods used for testing | Metrics: False Negative Rate Parity, False Positive Rate Parity, False Discovery Rate Parity, False Omission Rate Parity, True Positive Rate Parity, True Negative Rate Parity, Positive Predictive Value Parity, Negative Predictive Value Parity Method: - [Tabular] Performance vs Fairness trade-off by category - [Tabular] Measure prediction among sensitive features. [1] |
Metrics: - Accumulated differences in predictions (ALE) - Average predicted values (PDP) - Cooperative game theory (Shapley values) Method: - [Tabular] Accumulates local changes in predictions with small intervals (ALE) - [Tabular] Averages predictions over marginal distribution of other features (PDP) - [Tabular] How features affect overall predictions using Shapley Values. |
Metrics: - Model accuracy on original dataset - Model accuracy on perturbed dataset Method: - [Tabular] Generate perturbed dataset using boundary attack algorithm on the test dataset. - [Image] Apply corruption functions (example blur) and compare robustness of model |
Refer to detailed guides for running fairness tests, running explainability tests, running robustness tests if you are looking to implement them for your use cases.