Learning to Steer Large Language Models

Erik Miehling; Irene Ko; Praveen Venkateswaran; Dennis Wei

AAAI 2026

Tutorial

20 Jan 2026

Learning to Steer Large Language Models

Abstract

Current algorithms for steering LLM behavior are often implemented for specific use cases and tasks. To help provide a more general purpose approach to steering model behavior, IBM has recently developed two toolkits: AI Steerability 360 (AISteer360) and In-Context Explainability 360 (ICX360). This hands-on lab will provide a comprehensive walkthrough of these toolkits.

Participants will first be guided through a conceptual overview of how to steer model behavior across four model control surfaces: input, structural, state, and output steering. Through a series of interactive coding sessions, attendees will implement steering methods on a running example focused on steering a model to produce less toxic outputs. The lab will demonstrate how to construct steering controls for fine-grained model intervention, use cases for specific model tasks, and benchmarks for comparing steering methods on a given use case.

Closing the loop, participants will learn how the ICX360 toolkit can be used to understand why a given (steered) model produced a given output, and how to use these insights to guide refined controls. The session will build progressively from concept to implementation, ensuring participants understand how to create an end-to-end steering workflow.

Conference paper