Google launched Gemini 3.1 Flash TTS on April 15, 2026, a text-to-speech model supporting 70+ languages, native multi-speaker dialogue, and over 200 audio tags that can be embedded directly in input text to control vocal style, pace, and delivery with natural-language instructions. The model achieves an Elo score of 1,211 on the Artificial Analysis TTS leaderboard and all generated audio is watermarked with SynthID for AI-content identification. It is available in developer preview via the Gemini API and Google AI Studio, on Vertex AI for enterprises, and through Google Vids for Workspace users.

Google releases Gemini 3.1 Flash TTS with natural-language audio tags and SynthID watermarking

Citations