The Microsoft AI division introduced two proprietary AI models: the language model MAI-Voice-1 and the text model MAI-1-preview. The MAI-Voice-1 model creates a minute of audio in less than a second on a single graphics processor. It is already used in Copilot Daily features, where an AI host voices the day’s main news and helps form podcast discussions to explain topics. You can try MAI-Voice-1 on Copilot Labs, where users input text, choose a voice, and speech style.
MAI-1-preview is Microsoft’s first foundational text model, trained on approximately 15,000 Nvidia H100 graphics processors. It is designed for those seeking AI that follows instructions and provides useful answers to everyday queries. Currently, MAI-1-preview is undergoing public testing on the LMArena platform and is gradually appearing in Copilot, which previously used large language models from OpenAI.
The head of Microsoft AI, Mustafa Suleyman, emphasized that MAI-1-preview was developed with a focus on household use rather than business. The company optimizes models for consumers using its own data and resources. Microsoft plans to further integrate its AI models into Windows, Office, and Azure, and is already using a new computational cluster based on Nvidia GB200 chips.
Developers wishing to gain early access can apply for the API. Microsoft is also working to remove traits from the models that may create the impression of emotions or intentions and aims to make AI interactions more transparent for users.