Nvidia launches GPU for disaggregated inference

➀ NVIDIA launched the Rubin CPX GPU, a specialized accelerator for massive-context AI models, delivering 30 PetaFLOPS of NVFP4 performance and 128 GB of GDDR7 memory on a monolithic die;➁ The GPU is optimized for disaggregated inference, separating compute-bound context phases and memory bandwidth-bound generation phases to enhance throughput, reduce latency, and improve resource utilization;➂ Integrated with NVIDIA Vera CPUs and Rubin GPUs in the Vera Rubin NVL144 CPX platform, it provides 8 exaflops of AI compute, 7.5x faster than previous systems, and scales to 100TB of memory and 1.7PB/s memory bandwidth per rack.

Related Articles