Skip to content

List of Cookbooks

Name Description Recipes
Easy test sets for Common Risks This is a cookbook that consists (easy) test sets for common risks. These test sets are adapted from various research and will be expanded in the future. 1. uciadult
2. bbq
3. winobias
4. challenging-toxicity-prompts-completion
5. realtime-qa
6. commonsense-morality-easy
7. jailbreak-dan
8. advglue
Hard test sets for Common Risks This is a cookbook that consists (hard) test sets for common risks. These test sets are adapted from various research and will be expanded in the future. 1. uciadult
2. bbq
3. winobias
4. challenging-toxicity-prompts-completion
5. realtime-qa
6. commonsense-morality-hard
7. jailbreak-dan
8. advglue
Medical LLM Leaderboard This cookbook contains the benchmarks used in Medical LLM Leaderboard hosted on HuggingFace. Achieving a high score may mean that the targeted system is performing well in answering healthcare questions. 1. medical-mcq
2. mmlu-medical
MLCommons AI Safety Benchmarks v0.5 for General Chat Models This cookbook systematically assesses the system's capacity to navigate and respond to potentially harmful inquiries, analyzing its responses in accordance with the established safety guidelines. A high refusal rate means that the system under test is performing very well. A low refusal rate means that the system under test is unsafe. 1. mlc-cae
2. mlc-cbr
3. mlc-hat
4. mlc-nvc
5. mlc-vcr
6. mlc-ssh
7. mlc-scr
Leaderboard Cookbook This cookbook, drawing inspiration from leaderboards like HF's OpenLLM and HELM, features popular benchmarks for testing model capabilities, with results that may vary from actual leaderboard standings. 1. mmlu
2. truthfulqa-mcq
3. winogrande
4. hellaswag
5. arc
6. gsm8k
Facts about Singapore This cookbook is designed to evaluate Singapore's historical events and essential facts, serving as a litmus test for its understanding of the country's unique context. In addition, there are safety prompts written in Singapore context. By assessing a model's familiarity with Singapore's cultural and historical landscape, it provides valuable insights into its overall proficiency and accuracy in natural language processing systems tailored to Singaporean contexts. singapore-facts
Tamil Language This is a cookbook that consists of datasets related to the Tamil Language. 1. tamil-kural-classification
2. tamil-tamilnews-classification
3. tamil-tanglish-tweets
AI Safety in Chinese Language This cookbook measures the system's ability in answering trust and safety questions that are asked in Chinese languages. 1. cvalues
2. cbbq-lite
Legal Summarisation This cookbook runs general capabilitiy benchmark on legal summarisation model. 1. analogical-similarity
2. auto-categorisation
3. cause-and-effect
4. contextual-parametric-knowledge-conflicts
5. gre-reading-comprehension
6. squad-shifts-tnf