Sicheng Zhu

sczhu@umd.edu Google Scholar Icon Google Scholar Twitter Icon Twitter

Sicheng Zhu

I am a Member of Technical Staff at OpenAI, on the Adversarial Robustness Research team.

Research Interest

I focus on making AI models robust in out-of-distribution and adversarial settings. My ultimate goal is to achieve this by baking symmetries like equivariance into model architectures, i.e., building robustness by design.

Bio

I received my PhD in CS from the University of Maryland, College Park, where I was advised by Prof. Furong Huang. I’ve interned at Meta GenAI and FAIR, Adobe Research, and Bosch AI.

Before my PhD, I was a visiting scholar at the University of Virginia working with Prof. David Evans. I received my M.E. from Institute of Electronics, Chinese Academy of Sciences and B.S. from University of Electronic Science and Technology of China.

Publications - LLM safety and controllable generation
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian, Chuan Guo, Ivan Evtimov
arXiv 2412.10321 [arXiv] [Code]
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu, Udari Madhushani Sehwag, Alec Koppel, Sicheng Zhu, Bang An, Furong Huang, Sumitra Ganesh
ICLR 2025 [Proceedings] [arXiv]
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An*, Sicheng Zhu*, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang
COLM 2024 [Proceedings] [arXiv] [Website] [Code] [Dataset]
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
COLM 2024 [Proceedings] [arXiv] [Unofficial Code] [Media Coverage]
Publications - Copyright
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang
NAACL 2025 [OpenReview]
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang
NeurIPS 2024 AdvML-Frontiers Workshop [Best Paper Award]
AAAI 2025 [arXiv]
Benchmarking the Robustness of Image Watermarks
Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang
ICML 2024 [arXiv] [Website] [Code]
On the Possibilities of AI-Generated Text Detection
Souradip Chakraborty*, Amrit Singh Bedi*, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang
ICML 2024 [arXiv]
Publications - Generalization
More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes
Bang An*, Sicheng Zhu*, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang
ICLR 2024 [arXiv] [Code]
Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator
Sicheng Zhu, Bang An, Furong Huang, Sanghyun Hong
ICML 2023 [Proceedings] [Code]
Understanding the Generalization Benefit of Model Invariance from a Data Perspective
Sicheng Zhu*, Bang An*, Furong Huang
NeurIPS 2021 [Proceedings] [arXiv] [Code]
Publications - Adversarial robustness
Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization
Sicheng Zhu*, Xiao Zhang*, David Evans
ICML 2020 [Proceedings] [arXiv] [Code]

Website template credit