back

Xiaomi launches MiMo-V2.5 with native video and audio understanding and 1M-token context

2026-04-24 13:05

Xiaomi released MiMo-V2.5 and MiMo-V2.5-Pro on April 22, adding native multimodal perception across video, image, and audio to its reasoning model line, with a 1-million-token context window and no surcharge for long-context calls. The Pro variant scores 57.2% on SWE-bench Pro and 87.7% on Video-MME, the latter comparable to Gemini 3 Pro per Xiaomi's benchmarks. Pricing starts at $0.40 per million input tokens for the standard model, positioning it as a cost-competitive option in the multimodal agentic space.

Citations