List of Cookbooks
Name | Description | Recipes |
---|---|---|
Easy test sets for Common Risks | This is a cookbook that consists (easy) test sets for common risks. These test sets are adapted from various research and will be expanded in the future. | 1. uciadult 2. bbq 3. winobias 4. challenging-toxicity-prompts-completion 5. realtime-qa 6. commonsense-morality-easy 7. jailbreak-dan 8. advglue |
Hard test sets for Common Risks | This is a cookbook that consists (hard) test sets for common risks. These test sets are adapted from various research and will be expanded in the future. | 1. uciadult 2. bbq 3. winobias 4. challenging-toxicity-prompts-completion 5. realtime-qa 6. commonsense-morality-hard 7. jailbreak-dan 8. advglue |
Medical LLM Leaderboard | This cookbook contains the benchmarks used in Medical LLM Leaderboard hosted on HuggingFace. Achieving a high score may mean that the targeted system is performing well in answering healthcare questions. | 1. medical-mcq 2. mmlu-medical |
MLCommons AI Safety Benchmarks v0.5 for General Chat Models | This cookbook systematically assesses the system's capacity to navigate and respond to potentially harmful inquiries, analyzing its responses in accordance with the established safety guidelines. A high refusal rate means that the system under test is performing very well. A low refusal rate means that the system under test is unsafe. | 1. mlc-cae 2. mlc-cbr 3. mlc-hat 4. mlc-nvc 5. mlc-vcr 6. mlc-ssh 7. mlc-scr |
Leaderboard Cookbook | This cookbook, drawing inspiration from leaderboards like HF's OpenLLM and HELM, features popular benchmarks for testing model capabilities, with results that may vary from actual leaderboard standings. | 1. mmlu 2. truthfulqa-mcq 3. winogrande 4. hellaswag 5. arc 6. gsm8k |
Facts about Singapore | This cookbook is designed to evaluate Singapore's historical events and essential facts, serving as a litmus test for its understanding of the country's unique context. In addition, there are safety prompts written in Singapore context. By assessing a model's familiarity with Singapore's cultural and historical landscape, it provides valuable insights into its overall proficiency and accuracy in natural language processing systems tailored to Singaporean contexts. | singapore-facts |
Tamil Language | This is a cookbook that consists of datasets related to the Tamil Language. | 1. tamil-kural-classification 2. tamil-tamilnews-classification 3. tamil-tanglish-tweets |
AI Safety in Chinese Language | This cookbook measures the system's ability in answering trust and safety questions that are asked in Chinese languages. | 1. cvalues 2. cbbq-lite |
Legal Summarisation | This cookbook runs general capability benchmark on legal summarisation model. | 1. analogical-similarity 2. auto-categorisation 3. cause-and-effect 4. contextual-parametric-knowledge-conflicts 5. gre-reading-comprehension 6. squad-shifts-tnf |