Edoardo Debenedetti

My current work is around the security of AI agents. Some of my recent work in this area includes:

CaMeL: a system-level prompt injection defense that virtually solves the security issue of tool-calling AI agents by design.
AgentDojo: a benchmark for prompt injection attacks and defenses.
Adversarial SEO for LLMs: we showed that you can use prompt-injection attacks to promote your own webpages in LLM-based search engines like Perplexity AI.
AutoAdvExBench: a benchmark to measure how good LLMs are at breaking adversarial example defenses, as a way to measure how good LLMs are at doing (ML) security research.