MiniMax released M2.7, a multimodal agent model that participates in its own reinforcement learning development cycle. Earlier model versions built a research-agent harness managing data pipelines, training environments, and evaluation infrastructure; M2.7 then ran more than 100 autonomous optimization rounds—modifying scaffold code and deciding whether to keep or revert each change. Within MiniMax's RL team the model now handles 30–50% of daily workflows end-to-end. On SWE-Bench Pro, M2.7 scores 56.22% using 10B activated parameters at $0.30/M input tokens. The model was released as a proprietary system in March 2026 and open-sourced on April 12, 2026.