## **PoC**
[The code](https://github.com/tianyu139/meaning-as-trajectories)
## **Details**
This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. Moreover, unlike vector based representations, distribution-based representations can also model asymmetric relations (e.g., direction of logical entailment, hypernym/hyponym relations) by using algebraic operations between likelihood functions.[1](https://arxiv.org/pdf/2310.18348)
The "meaning-as-trajectories" approach leverages trajectory mapping to understand semantic meaning dynamically rather than through static representations. For LLM defense, this could provide a novel way to detect adversarial inputs or malicious attempts by observing deviations in meaning trajectories. If a typical trajectory represents benign input, defensive mechanisms could flag trajectories that diverge sharply as potentially harmful or adversarial.
In practical terms, this could mean:
1. **Dynamic Semantic Tracking**: Instead of static embeddings, the LLM could track the trajectory of meanings, identifying when an input veers toward unintended interpretations.
2. **Context-Aware Filtering**: By examining how meaning evolves through a conversation, the system could better understand and block prompts that subtly shift toward malicious or manipulative outputs.
3. **Robustness Against Injection Attacks**: Since injection attacks often rely on shifting the model's behavior, a trajectory-based understanding could detect non-standard semantic pathways that deviate from normal user interactions.
4.
[paper](https://arxiv.org/pdf/2310.18348 )