The Legal Controversy Around AI Knowledge Distillation: A Balanced Perspective

By Yongkang Chen

 

 

 

I. Introduction: The DeepSeek Phenomenon and Knowledge Distillation

Source : Medium.com

In the rapidly evolving landscape of artificial intelligence, an interesting development has caught the attention of both industry insiders and legal experts. The DeepSeek model has achieved something remarkable—it delivers performance comparable to OpenAI’s GPT-4o but at nearly 1/20 of the training cost. This achievement raises important questions about innovation, competition, and legal boundaries in AI development.

At the heart of DeepSeek’s approach lies knowledge distillation technology, a fascinating method that essentially extracts and compresses knowledge from complex « teacher models » and transfers it to smaller, more efficient « student models. » Despite their reduced size, these student models can achieve surprisingly similar performance to their larger counterparts.

It’s important to clarify that distillation primarily transfers knowledge rather than model parameters. The student model approximates the teacher’s knowledge capabilities, but their underlying parameters typically remain distinct. This article focuses specifically on the legality of typical knowledge distillation—using a teacher model’s outputs to train a student model—rather than the illegal copying of model parameters.

Knowledge distillation offers tremendous potential for newcomers in the AI industry to catch up with established leaders, particularly large corporations, while avoiding the inefficiency of « reinventing the wheel. » However, as DeepSeek has gained prominence, it has faced significant legal challenges regarding the legitimacy of its distillation practices. This blog explores these challenges and argues that, when viewed through the lens of balanced interests and industrial development, distillation practices deserve legal recognition.

 

II. The Legal Minefield: Challenges to Distillation Practices

Despite its technological promise, knowledge distillation sits at the intersection of several complex legal areas. Let’s examine the key legal risks that companies engaging in distillation practices might face.

 

1. Contract Violations: The First Line of Defense

The first and perhaps most straightforward legal challenge comes from user agreements. AI model providers typically include explicit prohibitions against distillation in their terms of service. OpenAI, for instance, clearly states in its user agreement that « using AI output content to develop models that compete with OpenAI is prohibited. »

When developers of student models proceed with distillation despite these restrictions, they expose themselves to breach of contract claims. Recent headlines illustrate this risk—the Financial Times and Bloomberg reported that OpenAI and Microsoft were investigating DeepSeek for potential contract violations. Although OpenAI later announced it wouldn’t pursue litigation against DeepSeek, this doesn’t guarantee immunity for other companies or future cases. The contractual risk remains a significant concern for anyone contemplating distillation techniques.

 

2. Unfair Competition: The Broader Market Perspective

Beyond contractual issues, distillation practices may trigger unfair competition claims. While anti-unfair competition laws don’t specifically address distillation, teacher model developers could invoke general provisions—such as Article 2 of China’s Anti-Unfair Competition Law—to argue that distillation disrupts market competition and harms legitimate business interests.

Their arguments would likely highlight several points: the violation of user agreements, the « free-riding » nature of using teacher model outputs to reduce training costs, the direct competitive relationship between teacher and student models, and the diminished competitive advantage for companies that invested heavily in original model development.

 

3. Trade Secrets: A Complex Question

Some experts suggest that trade secret infringement, rather than copyright violations, represents the more significant risk for distillation practices. However, this argument faces substantial challenges. The knowledge allegedly distilled by DeepSeek from OpenAI’s models would struggle to qualify as trade secrets, as it may not satisfy the requirements of being « not known to the public » and protected by « reasonable confidentiality measures. »

Moreover, an important question remains whether distillation might constitute « reverse engineering, » which typically doesn’t violate trade secret protections. This emerging area requires further legal exploration and development.

 

4. Copyright Concerns: A Matter of Creative Output

The copyright dimension adds another layer of complexity. Distillation uses teacher model outputs for training, so if these outputs qualify as copyrightable works, distillation could potentially infringe copyright protections.

Whether AI-generated content constitutes protectable works remains highly debated. Chinese courts, including the Beijing Internet Court, have ruled in some cases that AI-generated content can receive copyright protection. However, the U.S. Copyright Office maintains a more conservative position, explicitly denying copyright protection to content entirely generated by AI systems.

The copyright analysis doesn’t end there—even if outputs are protected, additional factors come into play, including whether copyright ownership belongs to users rather than model developers, and whether distillation might qualify as fair use under applicable copyright doctrines.

 

III. Beyond Current Law: The Case for Legitimizing Distillation

While current legal frameworks present challenges to distillation practices, we must recognize that these practices represent emerging behaviors in a rapidly evolving field. Existing laws weren’t designed with AI knowledge distillation in mind. A more thoughtful analysis requires considering broader principles of interest balancing and industrial development—principles that support affirming the legitimacy of distillation practices.

 

1. Balancing Interests: A Matter of Consistency

From an interest-balancing perspective, a compelling argument emerges: large AI development companies cannot reasonably claim they should be allowed to use others’ works for training without permission or compensation, while simultaneously prohibiting others from using their model outputs for similar purposes.

AI companies generally argue that using copyrighted materials in training constitutes fair use, exempting them from seeking permission or paying compensation. Their core argument highlights how requiring such permissions would dramatically increase development costs and hinder industry growth. Some jurisdictions, including Japan and certain Chinese courts, have begun recognizing such training uses as fair use.

This position on fair use significantly reduces burdens on AI developers and promotes innovation. Similarly, distillation can substantially reduce development burdens, especially for smaller AI companies, preventing redundant efforts and supporting broader industry advancement.

As scholars have observed: « In court, many model creators argue that data scraped from billions of web pages can be used for training under the fair use defense. However, competitors who try to learn from their models to develop their own also try to use a similar fair use defense. » This highlights an important contradiction—if AI companies argue distillation doesn’t constitute fair use, they potentially undermine their own fair use defense for training on copyrighted works, potentially causing themselves greater harm.

Consistency demands that if AI companies can use others’ creative works for training without permission or payment, others should similarly be permitted to use model outputs for training through distillation techniques.

 

2. Industry Development: Avoiding Harmful Concentration

The development of AI models demands increasingly powerful computational resources and vast data collections, driving traditional development costs ever higher. Reports indicate GPT-4’s training cost reached a staggering $63 million, effectively limiting model development to only the largest enterprises.

Given the fundamental importance of AI models, particularly foundation models, allowing a handful of tech giants to control this technology concentrates enormous power in very few hands. As scholars have warned: « Allowing power to concentrate in a few AI companies is undesirable and may even be dangerous. »

A healthy AI ecosystem requires diverse participation—not just from giants like OpenAI but also from smaller innovators like DeepSeek. Distillation technology enables smaller companies to build upon existing advancements, significantly reducing development costs and fostering healthier competition throughout the industry.

From an international perspective, the United States currently leads in AI model development, with major tech giants concentrated there. The prohibitive costs of development create substantial barriers for companies in developing nations. Distillation and similar techniques offer a path forward, providing opportunities for these nations to participate meaningfully in AI advancement.

For all these reasons, promoting healthy industry development requires affirming the legitimacy of knowledge distillation practices, even as we continue refining appropriate boundaries and safeguards.

Ce contenu a été mis à jour le 23 juillet 2025 à 9 h 55 min.