Area efficient phase calibration of a 1.6 GHz multiphase DLL
Ankur Agrawal, Pavan Kumar Hanumolu, et al.
CICC 2011
Reduced precision computation is a key enabling factor for energy-efficient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed-precision artificial intelligence (AI) chip that supports four compute precisions - FP16, Hybrid-FP8 (HFP8), INT4, and INT2 - to support diverse application demands for training and inference. The chip leverages cutting-edge algorithmic advances to demonstrate leading-edge power efficiency for 8-bit floating-point (FP8) training and INT4 inference without model accuracy degradation. A new HFP8 format combined with separation of the floating- and fixed-point pipelines and aggressive circuit/architecture optimization enables performance improvements while maintaining high compute utilization. A high-bandwidth ring protocol enables efficient data communication, while power management using workload-aware clock throttling maximizes performance within a given power budget. The AI chip demonstrates 3.58-TFLOPS/W peak energy efficiency and 26.2-TFLOPS peak performance for HFP8 iso-accuracy training, and 16.9-TOPS/W peak energy efficiency and 104.9-TOPS peak performance for INT4 iso-accuracy inference.
Ankur Agrawal, Pavan Kumar Hanumolu, et al.
CICC 2011
Ankur Agrawal, John Bulzacchelli, et al.
ISSCC 2012
Timothy O. Dickson, Yong Liu, et al.
IEEE JSSC
Swagath Venkataramani, Vijayalakshmi Srinivasan, et al.
ISCA 2021