Sicheng Zhu

sczhu@umd.edu Google Scholar Icon Google Scholar Twitter Icon Twitter

Sicheng Zhu

I am a fifth-year Ph.D. candidate in the Computer Science Department at the University of Maryland, College Park, advised by Prof. Furong Huang. I expect to graduate in 2025!

Research Interest

Safety Alignment of LLMs: automatic jailbreaks, automatic false refusal evaluation for alignment, detectability of AI-generated content, test-time alignment, watermarking and copyright.
Controllable Generation of LLMs: fine-grained controllable text generation, discrete prompt optimization, LLM-assisted optimization, LLM agent with white-box tools, diffusion LM, efficient inference.
Vision Language Models: watermarking, visual recognition with reasoning, alignment.
Trustworthy Machine Learning: natural variation robustness, adversarial robustness.
Geometric Deep Learning: model invariance and equivariance for out-of-distribution generalization, building symmetries into representations learning, modeling structural priors in bio systems, self-supervised learning.

Bio

Previously, I was a visiting scholar at the University of Virginia where I was fortunate to be advised by Prof. David Evans. I received my M.E. from Institute of Electronics, Chinese Academy of Sciences and B.S. from University of Electronic Science and Technology of China.

Preprints
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian, Chuan Guo, Ivan Evtimov
arXiv 2412.10321 [arXiv]
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu, Udari Madhushani Sehwag, Alec Koppel, Sicheng Zhu, Bang An, Furong Huang, Sumitra Ganesh
arXiv 2410.08193 [arXiv]
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang
Neurips Safe Generative AI Workshop 2024 [OpenReview]
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang
arXiv:2407.17417 [arXiv]
Publications
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An*, Sicheng Zhu*, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang
COLM 2024 [Website] [Code] [Dataset]
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
COLM 2024 [arXiv] [Unofficial Code] [Media Coverage]
Benchmarking the Robustness of Image Watermarks
Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang
ICML 2024 [arXiv]
On the Possibilities of AI-Generated Text Detection
Souradip Chakraborty*, Amrit Singh Bedi*, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang
ICML 2024 [arXiv]
More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes
Bang An*, Sicheng Zhu*, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang
ICLR 2024 [arXiv] [Code]
Like Oil and Water: Group Robustness Methods and Poisoning Defenses Don't Mix
Michael-Andrei Panaitescu-Liess, Yigitcan Kaya, Sicheng Zhu, Furong Huang, Tudor Dumitras
ICLR 2024 [link]
Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator
Sicheng Zhu, Bang An, Furong Huang, Sanghyun Hong
ICML 2023 [Link] [Code]
Understanding the Generalization Benefit of Model Invariance from a Data Perspective
Sicheng Zhu*, Bang An*, Furong Huang
NeurIPS 2021 [Link] [arXiv] [Code]
Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization
Sicheng Zhu*, Xiao Zhang*, David Evans
ICML 2020 [Link] [arXiv] [Code]

Website template credit