Saurabh Paul, Christos Boutsidis, et al.
JMLR
The NorthPole Architecture achieves high performance with high efficiency by using local memory within a parallel, distributed core array, linked by networks-on-chip to ensure data availability, orchestrated by prescheduled, distributed local control. A 12nm NorthPole Inference Chip (22B transistors, 795mm2) includes a 256-Core Array with 192MB of distributed SRAM. At nominal 400MHz frequency, it computes TOPS exceeding 200 at 8b-, 400 at 4b-, and 800 at 2b-precision with very high utilization.
Saurabh Paul, Christos Boutsidis, et al.
JMLR
C.A. Micchelli, W.L. Miranker
Journal of the ACM
Joxan Jaffar
Journal of the ACM
Kenneth L. Clarkson, Elad Hazan, et al.
Journal of the ACM