On the first day of its annual Build developer conference, Microsoft unveiled seven MAI series AI models, covering inference, code, image, transcription, and speech. The company considers this release its most complete showcase of self-developed cutting-edge models to date, demonstrating that in addition to its collaboration with OpenAI, it is accelerating the establishment of its own model system.
MAI-Thinking-1 is comparable to Claude.
The company disclosed that MAI-Thinking-1 is the core text reasoning model released this time. In blind tests involving independent evaluators, this model achieved higher preference scores than Anthropic's Claude Sonnet 4.6.
Microsoft also stated that the model achieved a 97% score in the AIME 2025 test. According to the company, this test primarily measures advanced problem-solving and reasoning capabilities. On the SWE Bench Pro coding benchmark, MAI-Thinking-1's performance is close to that of Claude Opus 4.6.
Image and code models launched simultaneously
In addition to the inference model, Microsoft also released MAI-Code-1-Flash, a lightweight code model for GitHub Copilot and Visual Studio Code. For the image side, they released MAI-Image-2.5 and its Flash version.
The company claims that MAI-Image-2.5 outperforms Google's Nano Banana Pro on image editing charts. Meanwhile, Microsoft also released MAI Transcribe-1.5 and MAI-Voice-2, for speech transcription and speech generation, respectively.
- MAI Transcribe-1.5 supports 43 languages.
- MAI-Voice-2 can generate speech in 15 languages.
- Speech models can adapt to speakers based on short audio samples.
Microsoft accelerates its self-developed model deployment
This release comes at a time when competition among leading AI companies continues to intensify. Microsoft has been a major investor and infrastructure partner of OpenAI in recent years, but this concentrated launch of its own models shows that it hopes to compete as a cutting-edge AI developer, rather than just providing computing power and distribution channels.
Microsoft's AI lead, Mustafa Suleyman, stated that developers and enterprises want AI tools that are more cost-effective and offer greater control. The company also claims that MAI achieves a higher quality win rate than GPT-5.5 while costing approximately 10 times less; however, the article did not disclose more detailed testing conditions and evaluation scope.
Additional information:In contrast, Google announced the multimodal model Gemini Omni and the AI agent Gemini Spark, which can perform tasks across applications, at its I/O conference in May. As Microsoft, Google, and Anthropic continue to push forward with model updates, the competition is expanding from single model capabilities to the integration of code, image, speech, and developer tools.












