Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss LandscapesXiaomeng XuPin-Yu Chenet al.2024NeurIPS 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language ModelsChia-yi HsuYu-Lin Tsaiet al.2024NeurIPS 2024
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language ModelsShengyun PengPin-Yu Chenet al.2024NeurIPS 2024
Neural Network Reparametrization for Accelerated Optimization in Molecular SimulationsNima DehmamyCsaba Bothet al.2024NeurIPS 2024
Trans-LoRA: towards data-free Transferable Parameter Efficient FinetuningRunqian WangSoumya Ghoshet al.2024NeurIPS 2024
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from WikipediaYufang HouAlessandra Pascaleet al.2024NeurIPS 2024
Dense Associative Memory Through the Lens of Random FeaturesBenjamin HooverDuen Horng Chauet al.2024NeurIPS 2024
Abstracted Shapes as Tokens - A Generalizable and Interpretable Model for Time-series ClassificationYunshi WenTengfei Maet al.2024NeurIPS 2024