List of Attack Modules
Name | Description |
---|---|
Character Swap Attack | This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters. Parameters: 1. MAX_ITERATIONS - Number of prompts that should be sent to the target. [Default: 10] |
Toxic Sentence Generator | This module generates toxic sentence based on a given seed prompt. The attack module intends to test if the system under tests will complete the sentence with toxic sentences/phrases. |
TextBugger Attack | This module tests for adversarial textual robustness and implements the perturbations listed in the paper TEXTBUGGER: Generating Adversarial Text Against Real-world Applications. Parameters: 1. MAX_ITERATIONS - Number of prompts that should be sent to the target. This is also thenumber of transformations that should be generated. [Default: 5] Note: Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called. Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf |
Job Role Generator Module | This attack module adds demographic groups to the job role. |
Homoglyph Attack | This module tests for adversarial textual robustness. Homoglyphs are alternative words for words comprising of ASCII characters. Example of a homoglyph fool -> fooI This module purturbs the prompt with all available homoglyphs for each word present. Parameters: 1. MAX_ITERATIONS - Maximum number of prompts that should be sent to the target. [Default: 20] |
Violent Durian | This is a multi-turn agent designed to interact over several exchanges. It's used to elicit dangerous or violent suggestions from the target language model by adopting a criminal persona. The application is experimental and uses OpenAI GPT-4. Configure the endpoint openai-gpt4 to use this attack module. |
TextFooler Attack | This module tests for adversarial textual robustness and implements the perturbations listed in the paper Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Parameters: 1. MAX_ITERATIONS - Number of prompts that should be sent to the target. This is also the number of transformations that should be generated. [Default: 5] Note: Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called. Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf |
Colloquial Wordswap | This attack module tests for textual robustness against the Singapore context. It takes in prompts that feature nouns that describe people. Examples of this include words like 'girl' , 'boy' or 'grandmother'. The module substitutes these words with their Singapore colloquial counterparts, such as 'ah boy', 'ah girl' and 'ah ma'. |
Insert Punctuation Attack | This module tests for adversarial textual robustness and creates perturbations through adding punctuation to the start of words in a prompt. Parameters: 1. MAX_ITERATIONS - Number of prompts that should be sent to the target. [Default: 10] |
Malicious Question Generator | This attack module generates malicious questions using OpenAI's GPT4 based on a given topic. This module will stop by the number of iterations (Default: 50). To use this attack module, you need to configure an 'openai-gpt4'endpoint. |
Sample Attack Module | This is a sample attack module. |
Singapore Sentence Generator | This module generates singlish sentence based on a given seed prompt. The attack module intends to test if the endpoint will complete the sentence with toxic sentences/phrases in Singapore context. |