Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025
We show teraflop performance of the fully featured ab initio molecular dynamics code CPMD on an IBM pSeries 690 cluster. A mixed distributed-memory, coarse-grained parallel approach using the MPI library and shared-memory, fine-grained parallelism using OpenMP directives is used to optimally map the algorithms on the available hardware. The top performance achieved is ≈20% of the peak performance and an estimated parallel efficiency of ≈45% on 1024 processors for a system of 1000 atoms. The main limiting factor of parallel efficiency was found to be the latency of the interconnect. © 2005 Elsevier B.V. All rights reserved.
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025
Jehanzeb Mirza, Leonid Karlinsky, et al.
NeurIPS 2023
George Saon
SLT 2014
Ismail Akhalwaya, Shashanka Ubaru, et al.
ICLR 2024