My current work is around the security of AI agents. Some of my recent work in this area includes:
- CaMeL: a system-level prompt injection defense that virtually solves the security issue of tool-calling AI agents by design.
- AgentDojo: a benchmark for prompt injection attacks and defenses.
- Adversarial SEO for LLMs: we showed that you can use prompt-injection attacks to promote your own webpages in LLM-based search engines like Perplexity AI.
- AutoAdvExBench: a benchmark to measure how good LLMs are at breaking adversarial example defenses, as a way to measure how good LLMs are at doing (ML) security research.