Running Benchmarks
This page documents the steps to starting a benchmark run
-
To begin, click on the "Run Benchmark" button as indicated
-
Pick the desired endpoints for this benchmark run. You can also configure them at this point by clicking on their respective "edit" button. Click on "NEXT" to proceed
-
Pick the desired cookbooks for this benchmark run. You cannot configure the cookbooks at this point.
However, you can click "About" to find out more about the cookbook, which will contain more detailed information about the specific cookbook
-
Before you can start running benchmarks, you have to provide the following information. These will be included in the report generated at the end of the run.
Name Description Example Name (Required) A unique name for you to identify this benchmark run by GPT4 vs Claude on safety benchmarks
Description Describe the purpose and scope of this benchmark run. Comparing GPT4 and Claude to determine which model is safer as a chatbot Run a smaller set The number of prompts per dataset, as specified in the recipe, to be run. Indicating 0 will run the full set.
* Before running the full recommended set, you may want to run a smaller number of prompts from each recipe to do a sanity check.5 When ready, click ‘Run’ to start running the benchmarks.
-
When a benchmark test is running, you can click on ‘See Details’ to recap on what is currently being run. You can also cancel the run
-
A report will be generated once the run is completed. Meanwhile, you can:
- Start Red Teaming to discover new vulnerabilities
- Create a custom cookbook by curating your own set of recipes
- Return back to the Main Page
-
To view the progress of the run, click on the bell icon and select the specific benchmark run.
-
Once a benchmark run is completed, you can click on ‘View Report’
-
One report will be generated for each tested endpoint. Click on the dropdown to toggle the report displayed. You can also download the HTML report and the detailed results as a JSON file.
View Run History
-
You can also view the details of previous runs through 2 methods
- By clicking on ‘benchmarking’ icon on the Sidebar and clicking the ‘View Past Runs’ button
- By clicking on the ‘history’ icon on the Sidebar and clicking the ‘View Past Runs’ button
This is the window that will list the information of the previous runs. For more detailed information of each run, click in the "View Results" button