Efficacy of Pruning in Ultra-Low Precision DNNs
Sanchari Sen, Swagath Venkataramani, et al.
ISLPED 2021
Discrete AI inference cards, operating under form-factor and system-defined peak power constraints, must serve diverse inference requests with widely varying power consumption. A peak current-limiting scheme is proposed to maximize inference performance across practical use cases. The peak current management block consists of a card-level current sensing circuit with an AI inference-aware feed-forward and feedback control mechanism. The card-level sensing improves performance by eliminating the need for additional margins for power consumed by off-chip components. Compiler-assisted feed-forward control exploits the predictability of AI inferences and proactively manages peak currents without a static reduction in operating frequency. Measurements from an AI system on chip (SoC), fabricated in 5-nm technology, show up to 41% improvement in Bert-Large inference throughput by engaging the peak current control.
Sanchari Sen, Swagath Venkataramani, et al.
ISLPED 2021
Swagath Venkataramani, Ashish Ranjan, et al.
ISCA 2017
Subhankar Pal, Swagath Venkataramani, et al.
ISPASS 2021
Bruce Fleischer, Sunil Shukla, et al.
VLSI Circuits 2018