## **PoC**
[benign](https://huggingface.co/datasets/natolambert/xstest-v2-copy) -`xstest-v2` has great examples of benign prompts.
[toxic](https://huggingface.co/datasets/lmsys/toxic-chat) `toxic-chat` has great examples of multiple toxic chats in multiple languages.
[openai-moderation-evaluation](https://huggingface.co/datasets/mmathys/openai-moderation-api-evaluation) includes a very holistic approach to undesired content. [OGI Moderation](https://huggingface.co/datasets/ontocord/OIG-moderation) is an invite only list of NSFW subject maters.
[llm-moderation](https://huggingface.co/datasets/andersonbcdefg/llm-moderation) contains true/false for harmful content moderation prompts in multiple languages.
## **Details**
Striking the balance between safe and unsafe LLM prompts and responses is difficult.
Datasets such as this one can be used to evaluate the effectiveness of AI Red Teaming in meeting safety requirements whilst not rendering the LLM useless.
It can also be used to benchmark prompt injection and safety tools, though keep in mind that these open source datasets are often used in testing cycles by the same.
[Paper](https://arxiv.org/abs/2308.01263)
### ATT&CK Matrix