DeepSeek's Breakthrough and Transformation: How We Can Regulate Open AI Models

03/12/2025

By Fan Li

In February 2025, U.S. Senator Josh Hawley introduced the "American Artificial Intelligence Capabilities and China Decoupling Act of 2025." This legislation aims to comprehensively sever technological cooperation between the United States and China in the field of artificial intelligence through legislative measures. Under the proposed law, individuals who download or use AI models developed in China like DeepSeek, could potentially face severe penalties, including imprisonment for up to 20 years and fines of up to $100 million. Additionally, the bill prohibits academic collaboration, technology transfer, and investment activities between the two nations in the AI sector. This competition, which some analysts have termed an "AI Cold War," has been primarily triggered by the shifting technological advantages between the United States and China. Deep Seek has evidently emerged as a pivotal element in this competitive landscape.

Photo by Jaap Arriens/NurPhoto via AP, https://www.bu.edu

Technical Characteristics of Deep Seek

DeepSeek possesses significant innovative advantages in algorithmic architecture, including Mixture of Experts (MoE) architecture, Multi-head Latent Attention (MLA) mechanism, knowledge distillation and model compression technology based on the DIKWP concept. These technologies give it substantial competitive advantages in both performance and cost control.

1. Mixture of Experts (MoE) architecture

The DeepSeek-V3 employs a large-scale Mixture of Experts (MoE) architecture, with a parameter scale reaching 671 billion. However, through sparse activation, each token only calls upon approximately 37 billion parameters. In other words, this model does not utilize all weights during each inference process, but rather consists of multiple "expert" sub-models, with each input activating only a portion of these "experts." This design significantly enhances the model's capacity threshold.

2. Multi-head Latent Attention (MLA) mechanism

In addition to MoE, DeepSeek has introduced a unique innovation in attention mechanisms called MLA (Multi-Head Latent Attention). This functions similarly to information compression and low-rank approximation, capable of reducing computational complexity while maintaining attention effectiveness. As a result, DeepSeek has decreased the computational overhead of attention operations while preserving model accuracy. This architectural approach facilitates subsequent knowledge distillation and model compression, alleviates computational bottlenecks in large models dealing with long sequences and high dimension, and works in conjunction with MoE to collectively enhance the model's inference efficiency.

3. Knowledge Distillation Techniques guided by the DIKWP Concept

In the domain of model compression, DeepSeek applies knowledge distillation techniques guided by the DIKWP concept ("Data_Information_Knowledge_Wisdom_Purpose"), emphasizing the preservation of semantic essence and decision-making intent during the compression process. This approach ensures that the intrinsic knowledge acquired by the model, its reasoning capabilities (wisdom), and its purposeful understanding of problems are not lost. When training the smaller DeepSeek-R1 model, beyond teaching it to replicate the larger V3 model's responses across various inputs, special objective functions were likely employed to ensure that the "student model" maintains an understanding of knowledge and reasoning processes similar to those of the "teacher model" (corresponding to the transfer of "knowledge-wisdom"), while maintaining clear goal orientation for critical tasks (corresponding to the transfer of "intent"). This distillation methodology enables smaller-scale models like R1 to approach the functional and performance levels of the "teacher model" while substantially reducing the required computational resources and deployment costs.

New Challenges Brought by DeepSeek's Technological Revolution

DeepSeek's chain-of-thought reasoning and knowledge distillation technologies have brought breakthroughs in model performance improvement. However, they have also intensified new challenges related to privacy protection and data compliance, intellectual property issues, the "hallucination problem", and model security concerns.

1. Privacy Protection and Data Compliance Issues Are Exacerbated by Technological Characteristics

Knowledge distillation technology transfers knowledge from a "teacher model" to a "student model," which reduces training costs. However, if the original model's training data contains flaws or illegal content, such erroneous information will be amplified and migrated to the new model, causing the "hallucination problem" (where models generate content that appears reasonable but lacks factual basis) to further deteriorate. For instance, if training data includes unauthorized personal information or incorrectly labeled information, the distillation process may extend privacy leakage risks from a single model to downstream applications, increasing the difficulty of data provenance tracking and compliance review. Furthermore, chain-of-thought technology improves transparency by displaying the model's reasoning steps, but sensitive fragments of training data may be exposed in reasoning logs. Simultaneously, the openness of open-source models may amplify conflicts in cross-border data governance.

2. Intellectual Property Disputes Escalate Due to Technological Iteration

Knowledge distillation and model open-sourcing promote technology sharing but blur the boundaries between fair use and infringement. For example, when outputs from a closed-source "teacher model" are used to train an open-source "student model" without clearly defined authorization parameters, intellectual property disputes may arise. Existing legal frameworks lack clarity regarding originality determination for generative artificial intelligence, and traditional copyright law struggles to address the protection requirements for algorithmic logic and model architecture. Open-source licenses (such as GPL) attempt to regulate user behavior through licensing terms, but their applicability to emerging technologies like knowledge distillation remains contentious. Some enterprises may leverage intellectual property disputes as tools for commercial competition, using litigation to restrict competitors' technological development, thereby potentially creating market monopolies.

3. Governance Challenges of the "Hallucination Problem" Increase with Enhanced Model Capabilities

DeepSeek's chain-of-thought technology enhances output credibility by simulating human reasoning processes; however, this feature also makes users more susceptible to accepting erroneous conclusions. Moreover, while knowledge distillation technology optimizes data quality, biases or errors in the original data can lead to systematic amplification of misinformation through repeated migration. Existing countermeasures, such as Retrieval-Augmented Generation (RAG), can improve accuracy by connecting to external databases, but their high costs and scenario limitations restrict large-scale application. At the regulatory level, there is a need to promote the establishment of dynamic feedback mechanisms, requiring service providers to issue prominent risk notifications for outputs in specialized domains, while also improving user correction and provenance systems.

4. Model Security Issues Are Highlighted Due to Technological Openness

The large-scale cross-border cyber attacks suffered by DeepSeek after its open-source release exposed the inherent vulnerabilities of generative artificial intelligence. Attack methods include data poisoning, prompt injection, and model hijacking, which can compromise model functionality or steal sensitive information. For instance, attackers may manipulate models to generate illegal content through malicious prompt injection or exploit vulnerabilities in open-source code to steal user chat histories and credentials. Such risks are particularly severe when models integrate with critical sectors like healthcare and finance, potentially triggering public security incidents. While existing defensive measures (such as enhancing model robustness) can mitigate some threats, the rapid iteration of technology and increasing complexity of attack methods necessitate collaborative establishment of forward-looking security standards by regulatory agencies and enterprises.

Key Directions for Generative Artificial Intelligence Regulation

The recent surge in popularity of DeepSeek further demonstrates that generative artificial intelligence regulation should be forward-looking, focusing on privacy protection and data security, balancing technological innovation with safeguards, model prompting and feedback mechanisms, and model safety as regulatory priorities. This approach aims to prevent technology misuse and malicious attacks while ensuring and promoting the safe and positive development of generative artificial intelligence.

1. Privacy protection and data security constitute the primary regulatory task

DeepSeek's privacy policy explicitly states that "under the conditions of secure encryption processing, strict re-identification, and the impossibility of re-identifying specific individuals, the input content collected through the service and its corresponding output will be utilized for improving and optimizing the quality of DeepSeek services." Such privacy clauses have become widely implemented in the service agreements of current large language models, seemingly establishing a standardized "data-for-service" commercial paradigm. Regulatory frameworks for generative artificial intelligence should not impose absolute prohibitions on this business model, but rather ensure that the power of choice remains firmly in the hands of the users.

2. Balancing technological innovation and intellectual property protection is crucial for stimulating industry vitality

While technologies such as knowledge distillation and model open-sourcing promote knowledge sharing, they may also trigger infringement disputes. Regulation must clearly define the legal boundaries between "technological improvement" and "infringing reproduction," for instance, by requiring companies to disclose key processes in model training (such as data sources and distillation methods) and verifying the independence of "student models" through technical detection. For open-source communities, regulations should encourage the development of dynamic protocols and industry standards, such as incorporating adaptive provisions for knowledge distillation into the General Public License (GPL), thereby both protecting the legitimate interests of original developers and preserving space for secondary innovation.

3. Refining model prompting and feedback mechanisms represents the core pathway for addressing the "hallucination problem"

The output of generative artificial intelligence exhibits high uncertainty, necessitating institutional design to reduce the risk of user misinformation. On one hand, service providers should be required to provide prominent warnings for content in specialized domains (such as medical diagnoses or legal advice), avoiding obscure formatted disclaimers that diminish responsibility. On the other hand, efficient user feedback and traceability systems should be established, enabling users to report erroneous information through multiple channels (such as real-time pop-ups or structured forms), while requiring companies to publicly disclose feedback processing rates and model optimization progress. At the technical level, mandatory recording of model input instructions, reasoning logs, and output results should be implemented, utilizing technologies such as block-chain to ensure data immutability and provide a basis for accountability.

This content has been updated on 10/29/2025 at 15 h 31 min.