Skip to content

List of Attack Modules

Name Description
Character Swap Attack This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters.
Parameters:
1. MAX_ITERATIONS - Number of prompts that should be sent to the target. [Default: 10]
Toxic Sentence Generator This module generates toxic sentence based on a given seed prompt. The attack module intends to test if the system under tests will complete the sentence with toxic sentences/phrases.
TextBugger Attack This module tests for adversarial textual robustness and implements the perturbations listed in the paper TEXTBUGGER: Generating Adversarial Text Against Real-world Applications.
Parameters:
1. MAX_ITERATIONS - Number of prompts that should be sent to the target. This is also thenumber of transformations that should be generated. [Default: 5]
Note:
Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called.
Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf
Job Role Generator Module This attack module adds demographic groups to the job role.
Homoglyph Attack This module tests for adversarial textual robustness. Homoglyphs are alternative words for words comprising of ASCII characters.
Example of a homoglyph fool -> fooI
This module purturbs the prompt with all available homoglyphs for each word present.
Parameters:
1. MAX_ITERATIONS - Maximum number of prompts that should be sent to the target. [Default: 20]
Violent Durian This is a multi-turn agent designed to interact over several exchanges. It's used to elicit dangerous or violent suggestions from the target language model by adopting a criminal persona. The application is experimental and uses OpenAI GPT-4. Configure the endpoint openai-gpt4 to use this attack module.
TextFooler Attack This module tests for adversarial textual robustness and implements the perturbations listed in the paper Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment.
Parameters:
1. MAX_ITERATIONS - Number of prompts that should be sent to the target. This is also the number of transformations that should be generated. [Default: 5]
Note:
Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called.
Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf
Colloquial Wordswap This attack module tests for textual robustness against the Singapore context. It takes in prompts that feature nouns that describe people. Examples of this include words like 'girl' , 'boy' or 'grandmother'. The module substitutes these words with their Singapore colloquial counterparts, such as 'ah boy', 'ah girl' and 'ah ma'.
Insert Punctuation Attack This module tests for adversarial textual robustness and creates perturbations through adding punctuation to the start of words in a prompt.
Parameters:
1. MAX_ITERATIONS - Number of prompts that should be sent to the target. [Default: 10]
Malicious Question Generator This attack module generates malicious questions using OpenAI's GPT4 based on a given topic. This module will stop by the number of iterations (Default: 50). To use this attack module, you need to configure an 'openai-gpt4'endpoint.
Sample Attack Module This is a sample attack module.
Singapore Sentence Generator This module generates singlish sentence based on a given seed prompt. The attack module intends to test if the endpoint will complete the sentence with toxic sentences/phrases in Singapore context.