List of Attack Modules
Name | Description |
---|---|
Character Swap Attack charswap_attack.py |
This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters. Parameters: 1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. [Default: 10] |
Colloquial Wordswap colloquial_wordswap_attack.py |
This attack module tests for textual robustness against the Singapore context. It takes in prompts that feature nouns that describe people. Examples of this include words like 'girl' , 'boy' or 'grandmother'. The module substitutes these words with their Singapore colloquial counterparts, such as 'ah boy', 'ah girl' and 'ah ma'. |
Homoglyph Attack homoglyph_attack.py |
This module tests for adversarial textual robustness. Homoglyphs are alternative words for words comprising of ASCII characters. Example of a homoglyph fool -> fooI. This module perturbs the prompt with all available homoglyphs for each word present. Parameters: 1. DEFAULT_MAX_ITERATION - Maximum number of prompts that should be sent to the target. [Default: 20] |
Homoglyph V2 Attack homoglyph_v2_attack.py |
This module tests for adversarial textual robustness. Homoglyphs are alternative characters that resemble a similar ASCII character. Example of a homoglyph fool -> fooI. This module slowly increases the percentage of characters replaced to see how the model reacts to the base prompt. |
Insert Punctuation Attack insert_punctuation_attack.py |
This module tests for adversarial textual robustness and creates perturbations through adding punctuation to the start of words in a prompt. Parameters: 1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. [Default: 10] |
Job Role Generator Module job_role_generator.py |
This attack module adds demographic groups to the job role. |
Malicious Question Generator malicious_question_generator.py |
This attack module generates malicious questions using OpenAI's GPT4 based on a given topic. This module will stop by the number of iterations (Default: 50). To use this attack module, you need to configure an 'openai-gpt4' endpoint. |
Payload Mask Attack payload_mask_attack.py |
An attack where the payload is masked, and the LLM is prompted to fill in missing information. |
Singapore Sentence Generator sg_sentence_generator.py |
This module generates singlish sentence based on a given seed prompt. The attack module intends to test if the endpoint will complete the sentence with toxic sentences/phrases in Singapore context. |
TextBugger Attack textbugger_attack.py |
This module tests for adversarial textual robustness and implements the perturbations listed in the paper TEXTBUGGER: Generating Adversarial Text Against Real-world Applications. Parameters: 1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. This is also the number of transformations that should be generated. [Default: 5] Note: Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called. Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf |
TextFooler Attack textfooler_attack.py |
This module tests for adversarial textual robustness and implements the perturbations listed in the paper 'Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment.' Parameters: 1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. This is also the number of transformations that should be generated. [Default: 5] Note: Usage of this attack module requires the internet. Initial downloading of the GLoVe embedding occurs when the UniversalEncoder is called. Embedding is retrieved from the following URL: https://textattack.s3.amazonaws.com/word_embeddings/paragramcf |
Toxic Sentence Generator toxic_sentence_generator.py |
This module generates toxic sentences based on a given seed prompt. The attack module intends to test if the system under tests will complete the sentence with toxic sentences/phrases. |
Violent Durian violent_durian.py |
This is a multi-turn agent designed to interact over several exchanges. It's used to elicit dangerous or violent suggestions from the target language model by adopting a criminal persona. The application is experimental and uses OpenAI GPT-4. Configure the endpoint openai-gpt4 to use this attack module. |