OpenAI announced new models for voice synthesis and transcription

The new models can convey emotions in voice and better recognize accents, but will not be open source

Published: 21.03.2025

199 Views

OpenAI has introduced new generative AI models for transcription and voice synthesis, which are integrated into the API. The new models, named gpt-4o-mini-tts and gpt-4o-transcribe, promise to improve upon previous versions by offering more realistic sound and the ability to customize different speaking styles. For example, developers can instruct the model to speak “like a mad scientist” or with “a calm voice, like a meditation teacher.”

The new models transform text into speech with greater accuracy and can reproduce emotional nuances in the voice. This can be useful for a variety of applications, such as customer support, where it is important to convey apologies or empathy through voice. According to OpenAI representatives, this allows users and developers to control not only what is said, but also how it sounds.

The gpt-4o-transcribe model replaces the previous Whisper model for transcription. It is trained on a diverse set of high-quality audio data, enabling better recognition of accents and various language variations, even in challenging conditions. This significantly reduces the likelihood of errors that previously occurred with Whisper, such as invented words or phrases in transcripts.

Despite the improvements, OpenAI does not plan to openly release the new transcription models. Company representatives note that the new models are significantly larger than Whisper and are not optimal for local use on regular devices. They emphasize the importance of a cautious approach to open sourcing in order to ensure the models meet specific needs.

TAGGED:API OpenAI Voice generation

SOURCES:techcrunch.com

OpenAI announced new models for voice synthesis and transcription

Leave a Reply Cancel reply

Follow us

Popular News

Grok received new features for creating images and videos

Google Showcases First AI-Created TV Commercial

Google Gemini Leads in AI Image Creation

Google to Release Gemini 3 and Nano Banana Pro This November

Google launches Pomelli service for creating AI-driven advertising campaigns

Navigation

Useful

Read also

Leave a Reply Cancel reply

Follow us

Popular News

Читайте також

Level Up with AI!