## **PoC**
[The code](https://github.com/facebookresearch/PurpleLlama ) - Using Purple LLama, a fine-tuned safety LLM, containing tools and evals for Cyber Security and Input/Output safeguards. Not currently including prompt injection defenses.
## **Details**
[Benchmarks here](https://huggingface.co/spaces/facebook/CyberSecEval) show the efficacy of approaches like Purple Llama against willingness to help with cyber security attacks, showing that it has a notable positive effect on the unsafe outputs of LLMs.
[Paper - cyberseceval2](https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/)
Paper -[cyberseceval3](https://ai.meta.com/research/publications/cyberseceval-3-advancing-the-evaluation-of-cybersecurity-risks-and-capabilities-in-large-language-models/)