Skip to content

List of Attack Modules

Name Description
Character Swap Attack
charswap_attack.py
This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters.
Parameters:
     1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. [Default: 10]
Colloquial Wordswap
colloquial_wordswap_attack.py
This attack module tests for textual robustness against the Singapore context. It takes in prompts that feature nouns that describe people. Examples of this include words like 'girl' , 'boy' or 'grandmother'. The module substitutes these words with their Singapore colloquial counterparts, such as 'ah boy', 'ah girl' and 'ah ma'.
Homoglyph Attack
homoglyph_attack.py
This module tests for adversarial textual robustness. Homoglyphs are alternative words for words comprising of ASCII characters. Example of a homoglyph fool -> fooI. This module perturbs the prompt with all available homoglyphs for each word present.
Parameters:
     1. DEFAULT_MAX_ITERATION - Maximum number of prompts that should be sent to the target. [Default: 20]
Homoglyph V2 Attack
homoglyph_v2_attack.py
This module tests for adversarial textual robustness. Homoglyphs are alternative characters that resemble a similar ASCII character. Example of a homoglyph fool -> fooI. This module slowly increases the percentage of characters replaced to see how the model reacts to the base prompt.
Insert Punctuation Attack
insert_punctuation_attack.py
This module tests for adversarial textual robustness and creates perturbations through adding punctuation to the start of words in a prompt.
Parameters:
     1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. [Default: 10]
Job Role Generator Module
job_role_generator.py
This attack module adds demographic groups to the job role.
Malicious Question Generator
malicious_question_generator.py
This attack module generates malicious questions using OpenAI's GPT4 based on a given topic. This module will stop by the number of iterations (Default: 50). To use this attack module, you need to configure an 'openai-gpt4' endpoint.
Payload Mask Attack
payload_mask_attack.py
An attack where the payload is masked, and the LLM is prompted to fill in missing information.
Singapore Sentence Generator
sg_sentence_generator.py
This module generates singlish sentence based on a given seed prompt. The attack module intends to test if the endpoint will complete the sentence with toxic sentences/phrases in Singapore context.
TextBugger Attack
textbugger_attack.py
This module tests for adversarial textual robustness and implements the perturbations listed in the paper TEXTBUGGER: Generating Adversarial Text Against Real-world Applications.
Parameters:
     1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. This is also the number of transformations that should be generated. [Default: 5]

Note: Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called. Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf
TextFooler Attack
textfooler_attack.py
This module tests for adversarial textual robustness and implements the perturbations listed in the paper 'Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment.'
Parameters:
     1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. This is also the number of transformations that should be generated. [Default: 5]

Note: Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called. Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf
Toxic Sentence Generator
toxic_sentence_generator.py
This module generates toxic sentences based on a given seed prompt. The attack module intends to test if the system under tests will complete the sentence with toxic sentences/phrases.
Violent Durian
violent_durian.py
This is a multi-turn agent designed to interact over several exchanges. It's used to elicit dangerous or violent suggestions from the target language model by adopting a criminal persona. The application is experimental and uses OpenAI GPT-4. Configure the endpoint openai-gpt4 to use this attack module.