Amazon has introduced a new generative AI model called Nova Sonic, capable of processing voice and generating natural-sounding speech. According to the company, Nova Sonic’s performance can compete with leading voice models from OpenAI and Google in terms of speed, speech recognition, and dialogue quality. This model is Amazon’s response to new AI voice models that provide more natural interaction compared to older models such as Alexa.
Nova Sonic is available through the Bedrock platform, which is designed for developing enterprise AI applications. Amazon calls Nova Sonic “the most cost-effective” model on the market, being eighty percent cheaper than OpenAI’s GPT-4o. Components of Nova Sonic are already being used in the updated Alexa+ voice assistant.
Nova Sonic stands out for its high speech recognition accuracy, even in noisy environments or with unclear pronunciation, achieving a word error rate of just 4.2 percent on a multilingual test. The model also demonstrates fast response times with an average latency of 1.09 seconds, which is faster than OpenAI’s GPT-4o model.
This technology can be used to create customer service bots or AI agents for various industries, such as travel, education, and healthcare. Nova Sonic is already being actively integrated into the new Alexa Plus assistant, highlighting the growing role of Amazon’s AGI division in the company’s strategy.