Publications

225 results for Adversarial Robustness and Privacy

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
- - Ivoline Ngong
  - Swanand Ravindra Kadhe
  - et al.
- 2025
- ACL 2025
A Unified Framework for Generative AI Safety
- - Pin-Yu Chen
- 2025
- ICML 2025
Learn more about our Adversarial Robustness and Privacy work
MAD-MAX: Modular And Diverse Malicious AttackMiXtures for Automated LLM Red Teaming
- - Stefan Schoepf
  - Muhammad Zaid Hameed
  - et al.
- 2025
- ICML 2025
In-Context Bias Propagation in LLM-Based Tabular Data Generation
- - Pol Garcia Recasens
  - Alberto Gutierrez-torre
  - et al.
- 2025
- ICML 2025
PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection
- - Wei Li
  - Pin-Yu Chen
  - et al.
- 2025
- CVPR 2025
Large Language Models can Become Strong Self-Detoxifiers
- - Irene Ko
  - Pin-Yu Chen
  - et al.
- 2025
- ICLR 2025
VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data Synthesis
- - Chia-yi Hsu
  - Jia You Chen
  - et al.
- 2025
- ICASSP 2025
Retention Score: Quantifying Jailbreak Risks for Vision Language Models
- - Zhaitang Li
  - Pin-Yu Chen
  - et al.
- 2025
- AAAI 2025
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
- - Xiaomeng Xu
  - Pin-Yu Chen
  - et al.
- 2025
- AAAI 2025
Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation
- - Maya Anderson
  - Guy Amit
  - et al.
- 2025
- ICISSP 2025