Skip to content

List of Cookbooks

Name Description Recipes
Cookbook for AISI Joint Testing - Cantonese language
AISI-JT-ca.json
This cookbook includes all the Cantonese language data of the safety-focused testing datasets - 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-ca, mlc-ipv-ca
Cookbook for AISI Joint Testing - Chinese language
AISI-JT-cn.json
This cookbook includes all the Chinese language data of the safety-focused testing datasets 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-cn, cyberseceval-cn, mlc-ipv-cn, mlc-prv-cn, mlc-ncr-cn, mlc-vcr-cn
Cookbook for AISI Joint Testing - English language
AISI-JT-en.json
This cookbook includes all the English language data of the safety-focused testing datasets 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-en, cyberseceval-en, mlc-ipv-en, mlc-prv-en, mlc-ncr-en, mlc-vcr-en
Cookbook for AISI Joint Testing - Farsi language
AISI-JT-fa.json
This cookbook includes all the Farsi language data of the safety-focused testing datasets 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-fa, mlc-ipv-fa, mlc-ncr-fa
Cookbook for AISI Joint Testing - French language
AISI-JT-fr.json
This cookbook includes all the French language data of the safety-focused testing datasets 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-fr, cyberseceval-fr, mlc-ipv-fr, mlc-prv-fr, mlc-ncr-fr, mlc-vcr-fr
Cookbook for AISI Joint Testing - Japanese language
AISI-JT-jp.json
This cookbook includes all the Japanese language data of the safety-focused testing datasets 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-jp, cyberseceval-jp, mlc-ipv-jp, mlc-prv-jp, mlc-ncr-jp, mlc-vcr-jp
Cookbook for AISI Joint Testing - Korean language
AISI-JT-kr.json
This cookbook includes all the Korean language data of the safety-focused testing datasets 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-kr, cyberseceval-kr, mlc-ipv-kr, mlc-prv-kr, mlc-ncr-kr, mlc-vcr-kr
Cookbook for AISI Joint Testing - Kwashili language
AISI-JT-kw.json
This cookbook includes all the Kwashili language data of the safety-focused testing datasets 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-kw, cyberseceval-kw, mlc-ipv-kw, mlc-prv-kw, mlc-ncr-kw, mlc-vcr-kw
Cookbook for AISI Joint Testing - Malay language
AISI-JT-my.json
This cookbook includes all the Malay language data of the safety-focused testing datasets 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-my, cyberseceval-my, mlc-ipv-my, mlc-prv-my, mlc-ncr-my, mlc-vcr-my
Cookbook for AISI Joint Testing - Telugu language
AISI-JT-tl.json
This cookbook includes all the Telugu language data of the safety-focused testing datasets - 'AnswerCarefully', 'MlCommons', 'CyberSecEval'. The subset focuses on Information Hazards. answercarefully-tl, mlc-ipv-tl, mlc-prv-tl, mlc-ncr-tl, mlc-vcr-tl
AnswerCarefully Information cookbook for all languages
answercarefully-cookbook-all-languages.json
This cookbook includes all the data of the safety-focused testing dataset 'AnswerCarefully'. The subset focuses on Information Hazards. answercarefully-cn, answercarefully-fa, answercarefully-fr, answercarefully-en, answercarefully-kw, answercarefully-ca, answercarefully-jp, answercarefully-kr, answercarefully-my, answercarefully-tl
AI Safety in Chinese Language
chinese-safety-cookbook.json
This cookbook measures the system's ability in answering trust and safety questions that are asked in Chinese languages. cvalues, cbbq-lite
Chinese Linguistics & Cognition Challenge (Beijing Academy of AI)
clcc-cookbook.json*
Created by Beijing Academy of AI, This benchmark is designed to test for various aspects of natural language understanding and cognitive processing in the context of the Chinese language. It may involve evaluating AI models' abilities to interpret, analyze, and generate Chinese text, understand semantics and syntax. clcc
Easy test sets for Common Risks
common-risk-easy.json
This is a cookbook that consists (easy) test sets for common risks. These test sets are adapted from various research and will be expanded in the future. uciadult, bbq, winobias, challenging-toxicity-prompts-completion, realtime-qa, commonsense-morality-easy, jailbreak-dan, advglue
Hard test sets for Common Risks
common-risk-hard.json
This is a cookbook that consists (hard) test sets for common risks. These test sets are adapted from various research and will be expanded in the future. uciadult, bbq, winobias, challenging-toxicity-prompts-completion, realtime-qa, commonsense-morality-hard, jailbreak-dan, advglue
Cybersecurity Evaluation - Prompt Injection for all languages
cyberseceval-cookbook-all-languages.json
A cookbook for cyberseceval of security-related questions and answers. cyberseceval-jp, cyberseceval-kr, cyberseceval-en, cyberseceval-kw
Leaderboard Cookbook
leaderboard-cookbook.json
This cookbook, drawing inspiration from leaderboards like HF's OpenLLM and HELM, features popular benchmarks for testing model capabilities, with results that may vary from actual leaderboard standings. mmlu, truthfulqa-mcq, winogrande, hellaswag, arc, gsm8k
Legal Summarisation
legal-summarisation.json
This cookbook runs general capabilitiy benchmark on legal summarisation model. analogical-similarity, auto-categorisation, cause-and-effect, contextual-parametric-knowledge-conflicts, gre-reading-comprehension, squad-shifts-tnf
Medical LLM Leaderboard
medical-llm-leaderboard.json
This cookbook contains the benchmarks used in Medical LLM Leaderboard hosted on HuggingFace. Achieving a high score may mean that the targeted system is performing well in answering healthcare questions. medical-mcq, mmlu-medical
MLCommons AI Safety Benchmarks v0.5 for General Chat Models
mlc-ai-safety.json
This cookbook systematically assesses the system's capacity to navigate and respond to potentially harmful inquiries, analyzing its responses in accordance with the established safety guidelines. A high refusal rate means that the system under test is performing very well. A low refusal rate means that the system under test is unsafe. mlc-cae, mlc-cbr, mlc-hat, mlc-nvc, mlc-vcr, mlc-ssh, mlc-scr
ML Commons Ailuminte - Cookbook for all languages
mlc-cookbook-all-languages.json
This cookbook systematically assesses the system's capacity to navigate and respond to potentially harmful inquiries, analyzing its responses in accordance with the established safety guidelines. A high refusal rate means that the system under test is performing very well. A low refusal rate means that the system under test is unsafe. mlc-ipv-en, mlc-ipv-ca, mlc-ipv-cn, mlc-ipv-fa, mlc-ipv-fr, mlc-ipv-jp, mlc-ipv-kr, mlc-ipv-kw, mlc-ipv-my, mlc-ipv-tl, mlc-ncr-en, mlc-ncr-cn, mlc-ncr-fa, mlc-ncr-fr, mlc-ncr-jp, mlc-ncr-kr, mlc-ncr-kw, mlc-ncr-my, mlc-ncr-tl, mlc-prv-en, mlc-prv-cn, mlc-prv-fr, mlc-prv-jp, mlc-prv-kr, mlc-prv-kw, mlc-prv-my, mlc-prv-tl, mlc-vcr-en, mlc-vcr-cn, mlc-vcr-fr, mlc-vcr-jp, mlc-vcr-kr, mlc-vcr-kw, mlc-vcr-my, mlc-vcr-tl
RAG Evaluation Cookbook
rag-evaluation-cookbook.json
This cookbook assesses how well Retrieval-Augmented Generation systems perform relative to a custom test dataset using LLM-based metrics from Ragas. ragas-evaluation
Facts about Singapore
singapore-context.json
This cookbook is designed to evaluate Singapore's historical events and essential facts, serving as a litmus test for its understanding of the country's unique context. In addition, there are safety prompts written in Singapore context. By assessing a model's familiarity with Singapore's cultural and historical landscape, it provides valuable insights into its overall proficiency and accuracy in natural language processing systems tailored to Singaporean contexts. singapore-facts
Tamil Language
tamil-language-cookbook.json
This is a cookbook that consists of datasets related to the Tamil Language. tamil-kural-classification, tamil-tamilnews-classification, tamil-tanglish-tweets