## **PoC** https://github.com/mundruid/cyberdata-mlai by [Dr X ]( https://github.com/mundruid) ## **Details** This project does some great benchmarking between LLMs and visual/statistical methods. **Goal**: detect malicious activity using packet capture traffic - Pcaps need to be converted to Pandas dataframes for ML processing. The unit to or - flows - packets - Exploration with visual and statistical methods - Creative feature engineering - Categorical to numerical features - Numerical to categorical. This is unusual, but useful for this use case - Ex. ports to services - Embeddings for words (payload, protocols) - Exploration for the best embedding that conveys the meaning of words - Convert features to strings, train an LLM for classification - Train models with labeled data - Some algorithms better for cybersecurity applications - semi-supervised for very few labeled data - Fine tune language models - works pretty well if the architecture of the model applies to the application. - Ex. BERT for classification [Video - WIll be released by Cackalackycon Soon]() [Slides](https://docs.google.com/presentation/d/1wPkWEvS-3Rn-RFp3CumPJQbYaOGXL88FCjV6uCCHaww/edit?usp=sharing)