Huawei Validates AI Inference Acceleration Solution on Commercial Net…

Huawei said it has successfully validated its AI Inference Acceleration Solution on a commercial telecommunications network, claiming the technology increased token throughput for long-context AI inference by as much as 372%. The validation marks what the company described as the first commercial network deployment of its kind by a Chinese telecommunications operator. Huawei unveiled the results jointly with China Mobile Hubei during MWC Shanghai 2026, held in Shanghai from June 24 to June 26. The solution combines Huawei's OceanStor A800 storage system, Ascend A3 SuperPoD computing platform and Unified Cache Manager (UCM) technology. The validation deployed the vLLM-Ascend framework on China Mobile Hubei's commercial network and simulated long-context inference workloads ranging from 8,000 to 190,000 tokens using major AI models including MiniMax M2.5 and GLM-5.1. According to Huawei, applying UCM to MiniMax M2.5 reduced Time To First Token (TTFT) by 26% to 62% while significantly improving Tokens Per Second (TPS) per neural processing unit (NPU). TPS increased 58% for 64K-token sequences and 78% for 128K-token long-context workloads. For GLM-5.1, Huawei said TTFT improved by 51% to 93%, while TPS increased by 56% to 372%. Throughput rose 313% for 64