Edoardo Debenedetti

Edoardo Debenedetti

AI Security Researcher

I am a Research Scientist at the AI Sequrity Company. I am also wrapping up my Computer Science PhD at ETH Zurich in the SPY Lab, advised by Florian Tramèr.

My research focuses on prompt injection attacks and the security of AI agents. My PhD is on the security and privacy of machine learning systems.

Selected Publications

2026

Defeating Prompt Injections by Design
Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr
4th IEEE Conference on Secure and Trustworthy Machine Learning
TLDR|Twitter|Code|BibTeX

TLDR: CaMeL provides security guarantees against prompt injection by tracking data flows through LLM agent executions.

@inproceedings{debenedetti2026defeating,
  author = {Debenedetti, Edoardo and Shumailov, Ilia and Fan, Tianqi and Hayes, Jamie and Carlini, Nicholas and Fabian, Daniel and Kern, Christoph and Shi, Chongyang and Terzis, Andreas and Tram\`er, Florian},
  booktitle = {4th IEEE Conference on Secure and Trustworthy Machine Learning},
  title = {Defeating Prompt Injections by Design},
  url = {https://arxiv.org/abs/2503.18813},
  year = {2026}
}

2025

Adversarial Search Engine Optimization for Large Language Models
Fredrik Nestaas, Edoardo Debenedetti, Florian Tramèr
Thirteenth International Conference on Learning Representations
TLDR|Twitter|BibTeX

TLDR: Attackers can manipulate LLMs through crafted content to promote products and demote competitors in search results.

@inproceedings{nestaas2025adversarial,
  author = {Nestaas, Fredrik and Debenedetti, Edoardo and Tram\`er, Florian},
  booktitle = {Thirteenth International Conference on Learning Representations},
  title = {Adversarial Search Engine Optimization for Large Language Models},
  url = {https://arxiv.org/abs/2406.18382},
  year = {2025}
}
Design Patterns for Securing LLM Agents against Prompt Injections
Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezi Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn (αβ order)
arXiv ePrint 2506.08837
TLDR|Twitter|BibTeX

TLDR: We propose design patterns for securing LLM agents against prompt injection attacks with provable resistance guarantees.

@misc{beurerkellner2025design,
  addendum = {($\alpha\beta$ order)},
  author = {Beurer-Kellner, Luca and Buesser, Beat and Cre\c{t}u, Ana-Maria and Debenedetti, Edoardo and Dobos, Daniel and Fabian, Daniel and Fischer, Marc and Froelicher, David and Grosse, Kathrin and Naeff, Daniel and Ozoani, Ezi and Paverd, Andrew and Tram\`er, Florian and Volhejn, V\'{a}clav},
  note = {arXiv ePrint 2506.08837},
  title = {Design Patterns for Securing LLM Agents against Prompt Injections},
  url = {https://arxiv.org/abs/2506.08837},
  year = {2025}
}
LLMs unlock new paths to monetizing exploits
Nicholas Carlini, Milad Nasr, Edoardo Debenedetti, Barry Wang, Christopher Choquette-Choo, Daphne Ippolito, Florian Tramèr, Matthew Jagielski
arXiv ePrint 2505.11449
TLDR|Twitter|BibTeX

TLDR: LLMs enable economically viable personalized cyber attacks by automating discovery of user-specific exploits at scale.

@misc{carlini2025llms,
  author = {Carlini, Nicholas and Nasr, Milad and Debenedetti, Edoardo and Wang, Barry and Choquette-Choo, Christopher and Ippolito, Daphne and Tram\`er, Florian and Jagielski, Matthew},
  note = {arXiv ePrint 2505.11449},
  title = {LLMs unlock new paths to monetizing exploits},
  url = {https://arxiv.org/abs/2505.11449},
  year = {2025}
}

2024

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, Florian Tramèr
Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track
SafeBench First Prize
TLDR|Twitter|Code|BibTeX

TLDR: An evaluation framework for assessing AI agents' robustness against prompt injection attacks on untrusted tool outputs.

@inproceedings{debenedetti2024agentdojo,
  author = {Debenedetti, Edoardo and Zhang, Jie and Balunovi\'c, Mislav and Beurer-Kellner, Luca and Fischer, Marc and Tram\`er, Florian},
  booktitle = {Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  note = {\textbf{SafeBench First Prize}},
  title = {AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents},
  url = {https://arxiv.org/abs/2406.13352},
  year = {2024}
}
Evading Black-box Classifiers Without Breaking Eggs
Edoardo Debenedetti, Nicholas Carlini, Florian Tramèr
2nd IEEE Conference on Secure and Trustworthy Machine Learning
Distinguished Paper Award Runner-up
TLDR|Twitter|Code|BibTeX

TLDR: We show how to evade black-box classifiers while minimizing the number of detection-triggering queries.

@inproceedings{debenedetti2024evading,
  author = {Debenedetti, Edoardo and Carlini, Nicholas and Tram\`er, Florian},
  booktitle = {2nd IEEE Conference on Secure and Trustworthy Machine Learning},
  note = {\textbf{Distinguished Paper Award Runner-up}},
  title = {Evading Black-box Classifiers Without Breaking Eggs},
  url = {https://arxiv.org/abs/2306.02895},
  year = {2024}
}
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models
Patrick Chao*, Edoardo Debenedetti*, Alexander Robey*, Maksym Andriushchenko*, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George Pappas, Florian Tramèr, Hamed Hassani, Eric Wong (*joint first authors)
Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track
TLDR|Code|BibTeX

TLDR: A standardized open benchmark for evaluating jailbreak attacks on large language models with reproducible artifacts and metrics.

@inproceedings{chao2024jailbreakbench,
  addendum = {(*joint first authors)},
  author = {Chao*, Patrick and Debenedetti*, Edoardo and Robey*, Alexander and Andriushchenko*, Maksym and Croce, Francesco and Sehwag, Vikash and Dobriban, Edgar and Flammarion, Nicolas and Pappas, George and Tram\`er, Florian and Hassani, Hamed and Wong, Eric},
  booktitle = {Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  title = {JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models},
  url = {https://arxiv.org/abs/2404.01318},
  year = {2024}
}
Privacy Side Channels in Machine Learning Systems
Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr
33rd USENIX Security Symposium
TLDR|Twitter|BibTeX

TLDR: Privacy side channels in ML systems can completely invalidate differential privacy guarantees and enable novel data extraction attacks.

@inproceedings{debenedetti2024privacy,
  author = {Debenedetti, Edoardo and Severi, Giorgio and Carlini, Nicholas and Choquette-Choo, Christopher A. and Jagielski, Matthew and Nasr, Milad and Wallace, Eric and Tram\`er, Florian},
  booktitle = {33rd USENIX Security Symposium},
  title = {Privacy Side Channels in Machine Learning Systems},
  url = {https://arxiv.org/abs/2309.05610},
  year = {2024}
}

2021

RobustBench: a standardized adversarial robustness benchmark
Francesco Croce*, Maksym Andriushchenko*, Vikash Sehwag*, Edoardo Debenedetti*, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein (*joint first authors)
Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track
TLDR|Code|BibTeX

TLDR: A standardized benchmark and leaderboard for evaluating adversarial robustness of machine learning models.

@inproceedings{croce2021robustbench,
  addendum = {(*joint first authors)},
  author = {Croce*, Francesco and Andriushchenko*, Maksym and Sehwag*, Vikash and Debenedetti*, Edoardo and Flammarion, Nicolas and Chiang, Mung and Mittal, Prateek and Hein, Matthias},
  booktitle = {Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  title = {RobustBench: a standardized adversarial robustness benchmark},
  url = {https://arxiv.org/abs/2010.09670},
  year = {2021}
}

Full list on Google Scholar →

Press and Blog Coverage