publications

the first authors with * contributed equally

2024

Preprint

Preference Poisoning Attacks on Reward Model Learning

Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, and Yevgeniy Vorobeychik

arXiv preprint arXiv:2402.01920, 2024
Preprint

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Muhao Chen, Junjie Hu, Yixuan Li, Bo Li, and Chaowei Xiao

arXiv preprint arXiv:2402.14968, 2024
ICLR 2024

Conversational Drug Editing Using Retrieval and Domain Feedback

Shengchao Liu, Jiongxiao Wang, Yijin Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, and Chaowei Xiao

In The Twelfth International Conference on Learning Representations, 2024

2023

Preprint

On the exploitability of reinforcement learning with human feedback for large language models

Jiongxiao Wang, Junlin Wu, Muhao Chen, Yevgeniy Vorobeychik, and Chaowei Xiao

arXiv preprint arXiv:2311.09641, 2023
Preprint

Test-time backdoor mitigation for black-box large language models with defensive demonstrations

Wenjie Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Chaowei Xiao, and Muhao Chen

arXiv preprint arXiv:2311.09763, 2023
Preprint

Adversarial Demonstration Attacks on Large Language Models

Jiongxiao Wang*, Zichen Liu*, Keun Hee Park, Muhao Chen, and Chaowei Xiao

arXiv preprint arXiv:2305.14950, 2023
NeurIPS 2023

On the exploitability of instruction tuning

Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, and Tom Goldstein

Advances in Neural Information Processing Systems, 2023
ICML 2023

A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification

Jiachen Sun, Jiongxiao Wang, Weili Nie, Zhiding Yu, Zhuoqing Mao, and Chaowei Xiao

In Proceedings of the 40th International Conference on Machine Learning, 2023

2022

ICLR 2022

Densepure: Understanding diffusion models for adversarial robustness

Chaowei Xiao*, Zhongzhu Chen*, Kun Jin*, Jiongxiao Wang*, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, and Dawn Song

In The Eleventh International Conference on Learning Representations, 2022
ICLR 2022

Defending against Adversarial Audio via Diffusion Model

Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, and Chaowei Xiao

In The Eleventh International Conference on Learning Representations, 2022
ICML 2022

Fast and reliable evaluation of adversarial robustness with minimum-margin attack

Ruize Gao, Jiongxiao Wang, Kaiwen Zhou, Feng Liu, Binghui Xie, Gang Niu, Bo Han, and James Cheng

In International Conference on Machine Learning, 2022