the first authors with * contributed equally


  1. Preprint
    Preference Poisoning Attacks on Reward Model Learning
    Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, and Yevgeniy Vorobeychik
    arXiv preprint arXiv:2402.01920, 2024
  2. Preprint
    Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
    Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Muhao Chen, Junjie Hu, Yixuan Li, Bo Li, and Chaowei Xiao
    arXiv preprint arXiv:2402.14968, 2024
  3. ICLR 2024
    Conversational Drug Editing Using Retrieval and Domain Feedback
    Shengchao Liu, Jiongxiao Wang, Yijin Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, and Chaowei Xiao
    In The Twelfth International Conference on Learning Representations, 2024


  1. Preprint
    On the exploitability of reinforcement learning with human feedback for large language models
    Jiongxiao Wang, Junlin Wu, Muhao Chen, Yevgeniy Vorobeychik, and Chaowei Xiao
    arXiv preprint arXiv:2311.09641, 2023
  2. Preprint
    Test-time backdoor mitigation for black-box large language models with defensive demonstrations
    Wenjie Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Chaowei Xiao, and Muhao Chen
    arXiv preprint arXiv:2311.09763, 2023
  3. Preprint
    Adversarial Demonstration Attacks on Large Language Models
    Jiongxiao Wang*, Zichen Liu*, Keun Hee Park, Muhao Chen, and Chaowei Xiao
    arXiv preprint arXiv:2305.14950, 2023
  4. NeurIPS 2023
    On the exploitability of instruction tuning
    Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, and Tom Goldstein
    Advances in Neural Information Processing Systems, 2023
  5. ICML 2023
    A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification
    Jiachen Sun, Jiongxiao Wang, Weili Nie, Zhiding Yu, Zhuoqing Mao, and Chaowei Xiao
    In Proceedings of the 40th International Conference on Machine Learning, 2023


  1. ICLR 2022
    Densepure: Understanding diffusion models for adversarial robustness
    Chaowei Xiao*, Zhongzhu Chen*, Kun Jin*, Jiongxiao Wang*, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, and Dawn Song
    In The Eleventh International Conference on Learning Representations, 2022
  2. ICLR 2022
    Defending against Adversarial Audio via Diffusion Model
    Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, and Chaowei Xiao
    In The Eleventh International Conference on Learning Representations, 2022
  3. ICML 2022
    Fast and reliable evaluation of adversarial robustness with minimum-margin attack
    Ruize Gao, Jiongxiao Wang, Kaiwen Zhou, Feng Liu, Binghui Xie, Gang Niu, Bo Han, and James Cheng
    In International Conference on Machine Learning, 2022