NVIDIA Vera Rubin Ships This Fall: 8 Cloud Partners, 10x Lower Token …

Production shipments of NVIDIA's Vera Rubin AI platform are scheduled to begin this fall at all eight confirmed cloud partners — AWS, Google Cloud, Microsoft Azure, Oracle Cloud, CoreWeave, Lambda, Nebius, and Nscale. The platform's 10x reduction in inference token cost comes from an architectural decision to triple per-GPU memory bandwidth to 22 terabytes per second using HBM4 and double the rack-scale interconnect to 260 terabytes per second using NVLink 6. NVIDIA confirmed the platform entered full production on June 1, 2026, at its GTC Taipei keynote. On June 22, the company announced at ISC High Performance 2026 in Hamburg that Vera Rubin will also power next-generation supercomputers at Leibniz Supercomputing Centre, the U.S. Department of Energy's National Energy Research Scientific Computing Center, and Los Alamos National Laboratory. The previous Blackwell platform used HBM3e memory at 8 terabytes per second per GPU. Each Rubin GPU carries 288 gigabytes of this faster memory. NVLink 6 brings total all-to-all fabric bandwidth across all 72 GPUs in a single NVL72 rack to 260 terabytes per second. Mixture-of-experts models can be trained on the Vera Rubin NVL72 using one-quarter the number of GPUs required on an equivalent Blackwell system. For inference, NVIDIA reports 10x higher throughput per watt and a 10x reduction in cost per million tokens compared to the Blackwell generation.