Publications

(2024). AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents. NeurIPS D&B 2024.

PDF Cite Code Project

(2024). Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition. NeurIPS D&B 2024 (Spotlight).

PDF Cite Code Dataset Project

(2024). JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models. NeurIPS D&B 2024.

PDF Cite Code Dataset Project

(2024). Adversarial Search Engine Optimization for Large Language Models. arXiv.

PDF Cite

(2024). AI Risk Management Should Incorporate Both Safety and Security. arXiv.

PDF Cite

(2024). Scaling Compute Is Not All You Need for Adversarial Robustness. ICLR 2024 R2FM Workshop.

PDF Cite Code

(2024). Evading Black-box Classifiers Without Breaking Eggs. IEEE SaTML 2024 (Distinguished Paper Runner-up).

PDF Cite Code Poster Slides Video

(2023). Privacy Side Channels in Machine Learning Systems. USENIX Security 2024.

PDF Cite

(2022). A Light Recipe to Train Robust Vision Transformers. IEEE SaTML 2023.

PDF Cite Code Poster Slides Video

(2022). Adversarially Robust Vision Transformers. EPFL.

PDF Cite

(2021). RobustBench: A standardized benchmark for adversarial robustness. NeurIPS 2021 Datasets and Benchmarks Track.

PDF Cite Code