Home
Publications
CV
Light
Dark
Automatic
3
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating …
Patrick Chao
,
Edoardo Debenedetti
,
Alexander Robey
,
Maksym Andriushchenko
,
Francesco Croce
,
Vikash Sehwag
,
Edgar Dobriban
,
Nicolas Flammarion
,
George J. Pappas
,
Florian Tramèr
,
Hamed Hassani
,
Eric Wong
PDF
Code
Project
Privacy Side Channels in Machine Learning Systems
Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML …
Edoardo Debenedetti
,
Giorgio Severi
,
Nicholas Carlini
,
Christopher A. Choquette-Choo
,
Matthew Jagielski
,
Milad Nasr
,
Eric Wallace
,
Florian Tramèr
PDF
Cite
Cite
×