Elon Musk stated that companies working with artificial intelligence have exhausted the available data for training their models. This means that the sum of human knowledge accessible for AI training has already been used. Musk, who founded his own company xAI, emphasized that the only way forward is to use synthetic data generated by AI models themselves.
Synthetic data is already actively used by leading technology companies. For example, Meta uses it to fine-tune its Llama models, and Microsoft applies it in Phi-4. Google and OpenAI also resort to this method to develop their systems. This not only saves resources but also opens up new opportunities for self-learning models.
However, synthetic data also has its drawbacks. Research shows that it can lead to so-called “model collapse,” when creativity decreases and biases increase. This can seriously affect the model’s functionality, as AI, when generating data, may transfer its own limitations and biases onto them.
Musk also drew attention to the problem of so-called AI “hallucinations,” when models generate inaccurate or nonsensical responses. This complicates the process of using synthetic data, as it is difficult to determine whether a response is realistic or fabricated. This issue is becoming increasingly relevant as the volume of AI-generated content grows and can be used for further model training.