DeepSeek could have trained its new model on Gemini data

Well-known companies are strengthening the protection of their systems following evidence of possible use of competitive data for training

Alex Dubenko

Published: 03.06.2025

News

185 Views

Illustrative image

Last week, the Chinese lab DeepSeek unveiled an updated version of its AI model R1-0528. However, the model immediately sparked heated debates — a developer from Melbourne, Sam Paich, released evidence suggesting that the DeepSeek model could have been trained on data obtained from Google Gemini, notably repeating words and expressions characteristic of Gemini. Similar observations were made by the creator of “SpeechMap,” noting that the “thoughts” generated by R1-0528 during operation closely resemble Gemini.

This is not the first time DeepSeek has been suspected of using competitors’ data to train its models. Back in December last year, developers noticed that one of the previous versions of DeepSeek often identified itself as ChatGPT, which could indicate training on chat logs from this platform. OpenAI previously reported detecting traces of so-called distillation — a method where a new model is trained on the outputs of more powerful systems, and linked this to DeepSeek. At the end of last year, Microsoft recorded massive data extraction through OpenAI developer accounts, which the company suspected were connected to DeepSeek.

In light of such accusations, leading AI market players are strengthening security measures. Since April, OpenAI requires identity verification from organizations using advanced models, with China absent from the list of supported countries. Google and Anthropic have also begun implementing additional restrictions — both companies now “aggregate” traces of their models to complicate competitors’ training on this data.

Despite this, some industry experts do not rule out that DeepSeek could indeed have used Google Gemini data to create its model. Researcher Nathan Lambert noted that with a lack of GPUs and sufficient funding, the company could well generate large volumes of synthetic data based on the best available models to gain additional computational capabilities.

TAGGED:DeepSeek Gemini R1

SOURCES:techcrunch.com

DeepSeek could have trained its new model on Gemini data

Leave a Reply Cancel reply

Follow us

Popular News

Grok received new features for creating images and videos

Sora by OpenAI now available for Android users in seven countries

Google Showcases First AI-Created TV Commercial

OpenAI prepares GPT-5.1 for complex user tasks

Google Gemini Leads in AI Image Creation

Navigation

Useful

Read also

Leave a Reply Cancel reply

Follow us

Popular News

Читайте також

Level Up with AI!