back

Google DeepMind's Decoupled DiLoCo cuts inter-datacenter training bandwidth 235× while matching model quality

yesterday 01:08

Google DeepMind published Decoupled DiLoCo on April 23, 2026, a distributed training architecture that divides large training runs into asynchronous compute "islands" across geographically separated datacenters. The method reduces required cross-datacenter bandwidth from 198 Gbps to 0.84 Gbps, and in high-failure simulations maintains 88% training goodput versus 27% for conventional synchronous approaches. In a real-world test, a 12 billion parameter model was trained across four US regions over ordinary 2–5 Gbps internet links, completing more than 20× faster than synchronous methods with final accuracy matching standard training (64.1% vs 64.4% on Gemma 4 tasks).

Citations