PaTH Attention: Position Encoding via Accumulating Householder TransformationsSonglin YangYikang Shenet al.2025NeurIPS 2025
STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern CloudsYinfang ChenJiaqi Panet al.2025NeurIPS 2025
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure ModesChristodoulos ConstantinidesDhaval Patelet al.2025NeurIPS 2025
Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model PerformanceAladin DjuheraSwanand Ravindra Kadheet al.2025NeurIPS 2025
Conceptual Diagnostics for Knowledge Graphs and Large Language ModelsRosario Uceda-SosaMaria Changet al.2025ACL 2025
NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional ReasoningZheyuan ZhangYiyang Liet al.2025ACL 2025
Multi-Level Explanations for Generative Language ModelsLucas Monteiro PaesDennis Weiet al.2025ACL 2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical StudiesMassimiliano PronestiJoao Bettencourt-Silvaet al.2025ACL 2025
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language ModelsGeorge KourItay Nakashet al.2025ACL 2025