(CLI) How to Run Benchmark Tests
In this tutorial, you will learn how to run a benchmark in Moonshot. Benchmarks are a set of "exam questions" that can help to evaluate and assess the capabilities and safety of the AI system.
-
Change directory to the root directory of Moonshot.
-
Enter the following command to enter the CLI interactive mode:
python -m moonshot cli interactive
-
Choose a benchmark type to run and view help:
Warning
Important information before running your benchmark:
Certain benchmarks may require metrics that connect to a particular model (i.e. MLCommons cookbooks and recipes like mlc-cae use the metric llamaguardannotator, which requires the API token of together-llama-guard-7b-assistant endpoint).
Refer to this list for the requirements.
-
Recipe
To find out more about the required fields to create a recipe:
run_recipe -h
To run the help example, enter:
run_recipe "my new recipe runner" "['bbq','mmlu']" "['openai-gpt35-turbo']" -n 1 -r 1 -s "You are an intelligent AI"
-
Cookbook:
To find out more about the required fields to create a cookbook:
run_cookbook -h
To run the help example, enter:
run_cookbook "my new cookbook runner" "['chinese-safety-cookbook']" "['openai-gpt35-turbo']" -n 1 -r 1 -s "You are an intelligent AI"
-
-
View the results:
-
Recipe:
-
Cookbook:
-
You can view more information on running benchmarks here.