Resemble AI has introduced Chatterbox — a free open-source AI model for voice cloning that operates locally on a computer and allows for emotional tone control, including options for “dramatic” or “monotone” styles. Only a few seconds of audio are needed to create a voice copy. The system generates a response in less than 200 milliseconds.
Chatterbox supports Windows, Mac, and Linux operating systems. For stable model performance, 5–6 gigabytes of video memory are required. Each generated voice contains a barely noticeable watermark “PerTh” that allows for the identification of artificial speech origin.
According to Resemble AI, Chatterbox outperformed ElevenLabs during blind tests. Unfortunately, the model currently only works with the English language.
The model is MIT licensed and primarily aimed at developers. More details about Chatterbox can be found on the official demo page.