Hume.ai introduced EVI 3 — the third generation of the speech model for personalized voice AI. The model combines speech recognition, query processing, and voice synthesis, providing responses in approximately three hundred milliseconds. EVI 3 allows the creation of new voices based on text descriptions, using combinations of over one hundred thousand sample recordings, and also enables customization of intonation, pace, and emotional style through reinforcement learning.
Thanks to a mixed system of text-voice tokens, the model can integrate external tools directly during the response. Among the available voices are standard options with various characters and descriptions, as well as the ability to create a custom voice using a simple text request. In blind testing with over one thousand seven hundred participants, EVI 3 showed better performance in empathy, expressiveness, naturalness, interruption handling, speed, and sound quality compared to models like GPT-4o, Gemini, and Sesame.
The demo version of EVI 3 is already available through the web interface and the iOS app, and access to the API will be available in the coming weeks. The model is aimed at use in areas such as customer support, health coaching, gaming, and other fields where the quality of voice interaction is important. The cost has not been announced yet, but the previous version was priced at seven cents per minute of use.
Currently, EVI 3 specializes in English, but the company plans to add support for French, German, Italian, and Spanish by the full release.