Power-Limited Inference Performance Optimization Using a Software-Assisted Peak Current Regulation Scheme in a 5-nm AI SoC

Monodeep Kar; Joel Silberman; Swagath Venkataramani; Viji Srinivasan; Bruce Fleischer; Joshua Rubin; JohnDavid Lancaster; Saekyu Lee; Matthew Cohen; Matthew Ziegler; Nianzheng Cao; Sandra Woodward; Ankur Agrawal; Ching Zhou; Prasanth Chatarasi; Thomas Gooding; Michael Guillorn; Bahman Hekmatshoartabari; Philip Jacob; Radhika Jain; Shubham Jain; Jinwook Jung; Kyu-Hyoun Kim; Siyu Koswatta; Martin Lutz; Alberto Mannari; Abey K. Mathew; Indira Nair; Ashish Ranjan; Zhibin Ren; Scot Rider; Thomas Rower; David Satterfield; Marcel Schaal; Sanchari Sen; Gustavo Tellez; Hung Tran; Wei Wang; Vidhi Zalani; Jintao Zhang; Xin Zhang; Vinay Shah; Robert Senger; Arvind Kumar; Pong-Fei Lu; Leland Chang

doi:10.1109/JSSC.2024.3472023

IEEE Journal of Solid-State Circuits

Paper

01 Jan 2024

Power-Limited Inference Performance Optimization Using a Software-Assisted Peak Current Regulation Scheme in a 5-nm AI SoC

View publication

Abstract

Discrete AI inference cards, operating under form-factor and system-defined peak power constraints, must serve diverse inference requests with widely varying power consumption. A peak current-limiting scheme is proposed to maximize inference performance across practical use cases. The peak current management block consists of a card-level current sensing circuit with an AI inference-aware feed-forward and feedback control mechanism. The card-level sensing improves performance by eliminating the need for additional margins for power consumed by off-chip components. Compiler-assisted feed-forward control exploits the predictability of AI inferences and proactively manages peak currents without a static reduction in operating frequency. Measurements from an AI system on chip (SoC), fabricated in 5-nm technology, show up to 41% improvement in Bert-Large inference throughput by engaging the peak current control.

Conference paper