Tian Gao, Amit Dhurandhar, et al.
NeurIPS 2025
We present the design and implementation of a new lifetime-aware tensor offloading framework for GPU memory expansion using low-cost PCIe-based solid-state drives (SSDs). Our framework, TeraX, is developed explicitly for large language model (LLM) training with multiple GPUs and multiple SSDs. Its design is driven by our observations that only a small fraction (<5%) of tensors in LLMs are active in each training iteration, many inactive tensors are large and will not be used for a long period of time, creating ample opportunities for offloading/prefetching them to/from slow SSDs without stalling the GPU training process. TeraX accurately estimates the lifetime of each tensor with the execution graph generated by PyTorch, based on which it will produce an optimized tensor offloading plan. TeraX has a runtime tensor migration engine to fulfill the offloading plan via GPUDirect storage that allows direct data transfer between GPUs and SSDs. In comparison with state-of-the-art studies such as ZeRO-Offload and ZeRO-Infinity, we demonstrate that TeraX improves the training performance of various LLMs, and achieves near ideal performance assuming unlimited GPU memory.
Tian Gao, Amit Dhurandhar, et al.
NeurIPS 2025
Jose Manuel Bernabe' Murcia, Eduardo Canovas Martinez, et al.
MobiSec 2024
Vidushi Sharma, Andy Tek, et al.
NeurIPS 2025
Weiqin Chen, Nhan Pham, et al.
NeurIPS 2025