|
Sicheng Zhu
✉ sczhu@umd.edu
Google Scholar
Twitter
I am a Member of Technical Staff at OpenAI, on the Adversarial Robustness Research team.
Research Interest
I focus on making AI models robust in out-of-distribution and adversarial settings. My ultimate goal is to achieve this by baking symmetries like equivariance into model architectures, i.e., building robustness by design.
Bio
I received my PhD in CS from the University of Maryland, College Park, where I was advised by Prof.
Furong Huang. I’ve interned at Meta GenAI and FAIR, Adobe Research, and Bosch AI.
Before my PhD, I was a visiting scholar at the University
of Virginia working with Prof.
David Evans. I received my M.E.
from Institute of Electronics, Chinese Academy of
Sciences and B.S. from University of
Electronic Science and Technology of China.
Publications - LLM safety and controllable generation
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian, Chuan Guo, Ivan Evtimov
arXiv 2412.10321
[arXiv]
[Code]
|
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu, Udari Madhushani Sehwag, Alec Koppel, Sicheng Zhu, Bang An, Furong Huang, Sumitra Ganesh
ICLR 2025
[Proceedings]
[arXiv]
|
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An*, Sicheng Zhu*, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang
COLM 2024
[Proceedings]
[arXiv]
[Website]
[Code]
[Dataset]
|
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu,
Ruiyi Zhang,
Bang An,
Gang Wu,
Joe Barrow,
Zichao Wang,
Furong Huang,
Ani Nenkova,
Tong Sun
COLM 2024
[Proceedings]
[arXiv]
[Unofficial Code]
[Media Coverage]
|
Publications - Copyright
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang
NAACL 2025
[OpenReview]
|
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang
NeurIPS 2024 AdvML-Frontiers Workshop [Best Paper Award]
AAAI 2025
[arXiv]
|
Benchmarking the Robustness of Image Watermarks
Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng,
Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang
ICML 2024
[arXiv]
[Website]
[Code]
|
On the Possibilities of AI-Generated Text Detection
Souradip Chakraborty*,
Amrit Singh Bedi*,
Sicheng Zhu,
Bang An,
Dinesh Manocha,
Furong Huang
ICML 2024
[arXiv]
|
Publications - Generalization
More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes
Bang An*,
Sicheng Zhu*,
Michael-Andrei Panaitescu-Liess,
Chaithanya Kumar Mummadi,
Furong Huang
ICLR 2024
[arXiv]
[Code]
|
Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator
Sicheng Zhu,
Bang An,
Furong Huang,
Sanghyun Hong
ICML 2023
[Proceedings]
[Code]
|
Understanding the Generalization Benefit of Model Invariance from a Data Perspective
Sicheng Zhu*,
Bang An*,
Furong Huang
NeurIPS 2021
[Proceedings]
[arXiv]
[Code]
|
Publications - Adversarial robustness
Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization
Sicheng Zhu*,
Xiao Zhang*,
David Evans
ICML 2020
[Proceedings]
[arXiv]
[Code]
|
|