Tutorial

Learning to Steer Large Language Models

Abstract

Current algorithms for steering LLM behavior are often implemented for specific use cases and tasks. To help provide a more general purpose approach to steering model behavior, IBM has recently developed two toolkits: AI Steerability 360 (AISteer360) and In-Context Explainability 360 (ICX360). This hands-on lab will provide a comprehensive walkthrough of these toolkits.

Participants will first be guided through a conceptual overview of how to steer model behavior across four model control surfaces: input, structural, state, and output steering. Through a series of interactive coding sessions, attendees will implement steering methods on a running example focused on steering a model to produce less toxic outputs. The lab will demonstrate how to construct steering controls for fine-grained model intervention, use cases for specific model tasks, and benchmarks for comparing steering methods on a given use case.

Closing the loop, participants will learn how the ICX360 toolkit can be used to understand why a given (steered) model produced a given output, and how to use these insights to guide refined controls. The session will build progressively from concept to implementation, ensuring participants understand how to create an end-to-end steering workflow.