Distillation is a major issue for AI labs, is it for anyone else?

Anthropic claims that Alibaba has carried out the largest “adversarial distillation attack” ever, with 28.8 million conversations via 25,000 fraudulent accounts allegedly siphoning Claude’s AI outputs. Distillation, also known as knowledge compression, has been a known concept for twenty years; in 2006, ML expert Rich Caruana and his team discovered that a small classification model could learn from dataset labels produced by a larger model. In 2015, Geoffrey Hinton introduced the idea of a “teacher” and “student” model. In 2019, DistilBERT, built on Google’s BERT, was 40 percent smaller but achieved 97 percent of its performance and was 60 percent faster. In October 2024, OpenAI encouraged developers to distill its leading LLMs via its own API. In January 2025, DeepSeek-R1 exhibited behavior closely resembling OpenAI’s o1 model, leading to doubts among investors about frontier labs losing out to companies that train on their technology. Tiny LLMs with as few as 1.5 billion parameters could perform well with minimal computational power, whereas GPT-2, also with 1.5 billion parameters released in 2019, had been nothing more than a fun experiment. Distillation is now an economic threat, as AI labs like Anthropic and OpenAI need to build a “moat” to make massive AI investments profitable.