Connect with us

Hi, what are you looking for?

Tech & Science

Has DeepSeek’s OpenAI copying been exposed?

This discovery raises concerns about DeepSeek-R1’s resemblance to OpenAI’s model.

OpenAI boss Sam Altman will attend the Paris summit and an appearance by DeepSeek's Liang Wenfeng is under discussion
OpenAI boss Sam Altman will attend the Paris summit and an appearance by DeepSeek's Liang Wenfeng is under discussion - Copyright AFP Lionel BONAVENTURE
OpenAI boss Sam Altman will attend the Paris summit and an appearance by DeepSeek's Liang Wenfeng is under discussion - Copyright AFP Lionel BONAVENTURE

Did DeepSeek-R1 train on OpenAI’s model? The answer is ‘yes’, according to new research from Copyleaks, a company that works on AI detection and governance. DeepSeek is the name of a free AI-powered chatbot, which looks, feels and works very much like ChatGPT.

Researchers built a text fingerprinting tool that can determine which AI model wrote a given text. After training it on thousands of AI-generated samples, the technologists tested it on known models—and the results were clear:

74.2 percent of R1’s texts match OpenAI’s style, which strongly suggests DeepSeek used OpenAI in its training.

For comparison, Microsoft’s Phi-4 model had 99.3 percent “disagreement,” meaning it showed no resemblance to any known model, confirming independent training. DeepSeek’s overwhelming similarity to OpenAI, on the other hand, is clear replication/copying.

OpenAI has made Europe a priority in its expansion of physical offices around the world, with sites in Paris, Brussels and Dublin
OpenAI has made Europe a priority in its expansion of physical offices around the world, with sites in Paris, Brussels and Dublin – Copyright AFP/File Lionel BONAVENTURE

This discovery raises concerns about DeepSeek-R1’s resemblance to OpenAI’s model, particularly regarding data sourcing, intellectual property rights, and transparency.

The Copyleaks Data Science Team conducted the research, led by Yehonatan Bitton, Shai Nisan, and Elad Bitton. The methodology involved a “unanimous jury” approach, relying on three distinct detection systems to classify AI-generated texts, with a judgment made only when all systems agreed.

There are also operational issues since an undisclosed reliance on existing models can reinforce biases, limit diversity, and pose legal or ethical risks. Beyond technical issues, DeepSeek’s claims of a groundbreaking, low-cost training method—if based on unauthorized distillation of OpenAI—may have misled the market, contributing to NVIDIA’s $593 billion single-day loss and giving DeepSeek an unfair advantage.

Using a highly rigorous approach, the research combined three advanced AI classifiers, each trained on texts from four major models: Claude, Gemini, Llama, and OpenAI. These classifiers identified subtle stylistic features like sentence structure, vocabulary, and phrasing. What made the method particularly effective was its “unanimous jury” system, where all three classifiers had to agree before a classification was made.

This ensured a robust check against false positives, resulting in an impressive 99.88 percent precision rate and just a 0.04 percent false-positive rate, accurately identifying texts from both known and unknown AI models.

“With this research, we have moved beyond general AI detection as we knew it and into model-specific attribution, a breakthrough that fundamentally changes how we approach AI content,” Shai Nisan, Chief Data Scientist at Copyleaks says in a statement provided Digital Journal.

Nisan adds: “This capability is crucial for multiple reasons, including improving overall transparency, ensuring ethical AI training practices, and, most importantly, protecting the intellectual property rights of AI technologies and, hopefully, preventing their potential misuse.”

Avatar photo
Written By

Dr. Tim Sandle is Digital Journal's Editor-at-Large for science news. Tim specializes in science, technology, environmental, business, and health journalism. He is additionally a practising microbiologist; and an author. He is also interested in history, politics and current affairs.

You may also like:

Tech & Science

As AI advances, scientists warn that failing to understand consciousness could lead to ethical mistakes.

Tech & Science

AI can beat average human creativity — but the most imaginative minds are still unmistakably human.

World

Half the world’s coral reefs were devastated by extreme ocean heat—and an even worse wave is happening right now.

Entertainment

Actor Max Rinehart chatted about his new movie "Royally Screwed," which came out today on the streaming service Passionflix.