Sicheng Zhu
✉ sczhu@umd.edu
Google Scholar
Twitter
I am a fifth-year Ph.D. candidate in the Computer Science Department at the
University of Maryland, College Park, advised by Prof.
Furong Huang. I expect to graduate in 2025!
Research Interest
Safety Alignment of LLMs: automatic jailbreaks, automatic false refusal evaluation for alignment, detectability of AI-generated content, test-time alignment, watermarking and copyright.
Controllable Generation of LLMs: fine-grained controllable text generation, discrete prompt optimization, LLM-assisted optimization, LLM agent with white-box tools, diffusion LM, efficient inference.
Vision Language Models: watermarking, visual recognition with reasoning, alignment.
Trustworthy Machine Learning: natural variation robustness, adversarial robustness.
Geometric Deep Learning: model invariance and equivariance for out-of-distribution generalization, building symmetries into representations learning, modeling structural priors in bio systems, self-supervised learning.
Bio
Previously, I was a visiting scholar at the University
of Virginia where I was fortunate to be advised by Prof.
David Evans. I received my M.E.
from Institute of Electronics, Chinese Academy of
Sciences and B.S. from University of
Electronic Science and Technology of China.
Preprints
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu, Udari Madhushani Sehwag, Alec Koppel, Sicheng Zhu, Bang An, Furong Huang, Sumitra Ganesh
arXiv 2410.08193
[arXiv]
|
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang
Neurips Safe Generative AI Workshop 2024
[OpenReview]
|
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang
arXiv:2407.17417
[arXiv]
|
Publications
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An*, Sicheng Zhu*, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang
COLM 2024
[Website]
[Code]
[Dataset]
|
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu,
Ruiyi Zhang,
Bang An,
Gang Wu,
Joe Barrow,
Zichao Wang,
Furong Huang,
Ani Nenkova,
Tong Sun
COLM 2024
[arXiv]
[Unofficial Code]
[Media Coverage]
|
Benchmarking the Robustness of Image Watermarks
Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng,
Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang
ICML 2024
[arXiv]
|
On the Possibilities of AI-Generated Text Detection
Souradip Chakraborty*,
Amrit Singh Bedi*,
Sicheng Zhu,
Bang An,
Dinesh Manocha,
Furong Huang
ICML 2024
[arXiv]
|
More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes
Bang An*,
Sicheng Zhu*,
Michael-Andrei Panaitescu-Liess,
Chaithanya Kumar Mummadi,
Furong Huang
ICLR 2024
[arXiv]
[Code]
|
Like Oil and Water: Group Robustness Methods and Poisoning Defenses Don't Mix
Michael-Andrei Panaitescu-Liess,
Yigitcan Kaya,
Sicheng Zhu,
Furong Huang,
Tudor Dumitras
ICLR 2024
[link]
|
Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator
Sicheng Zhu,
Bang An,
Furong Huang,
Sanghyun Hong
ICML 2023
[Link]
[Code]
|
Understanding the Generalization Benefit of Model Invariance from a Data Perspective
Sicheng Zhu*,
Bang An*,
Furong Huang
NeurIPS 2021
[Link]
[arXiv]
[Code]
|
Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization
Sicheng Zhu*,
Xiao Zhang*,
David Evans
ICML 2020
[Link]
[arXiv]
[Code]
|
|