NVIDIA has introduced a new AI model, Fugatto, which transforms text prompts into sounds, music, and even voices. This model is a versatile tool for working with audio, capable of creating or transforming any audio content, from musical fragments to unique sound effects.
According to the developers, Fugatto supports a wide range of tasks — from adding instruments to an existing composition to changing the accent or emotion in a voice. “This tool allows you to create entirely new sounds literally on the fly,” said Ido Zmishlany, producer and one of the project’s partners.
Fugatto is based on a generative transformer with 2.5 billion parameters. It was trained on NVIDIA supercomputers using the latest GPUs. Thanks to this, the model can not only perform tasks it was trained on but also generate new, previously unseen soundscapes, such as a gradual transition from a thunderstorm to birdsong at dawn.
This technology opens up numerous possibilities for musicians, advertising agencies, game developers, and educational platforms. It enables real-time sound creation, changes the emotional context of voices, and produces unique sound effects, including those that have never existed before.