Dnotitia Unveils STAR-KV, Achieving UP to 20x KV Cache Compression, S…

Dnotitia Inc. has released the paper and source code for "STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control," developed through a joint research effort involving UC San Diego's VVIP Lab and Dnotitia researchers. The paper was selected as a Spotlight paper at ICML 2026, representing about 2.2% of reviewed submissions and about 8.4% of accepted papers. In the experiments reported, low-rank compression alone reduced the KV cache by up to 75%, and combined with the mixed-precision quantization method, STAR-KV compressed the full KV cache by up to 20x. The technology also increases attention computation speed by up to 6.9x and overall generation throughput by up to 3.1x. According to the STAR-KV paper, when a LLaMA-3.1-8B model processes a 128K-token context at a batch size of 4, the KV cache accounts for about 81% of total GPU memory.