Hugging Face released ml-intern on April 21, an open-source autonomous agent built on its smolagents framework that reads arXiv papers, discovers and reformats datasets from the Hugging Face Hub, executes training scripts, and iterates on evaluation results in a continuous loop without manual intervention. On the PostTrainBench benchmark from the University of Tübingen and Max Planck Institute, ml-intern improved a Qwen3-1.7B model from ~10% to 32% accuracy on GPQA in under 10 hours — outperforming Claude Code, which reached 22.99% on the same task. The tool integrates with Hugging Face Jobs for compute and Trackio for experiment tracking, and is available as both a web app and CLI.

Hugging Face Releases ml-intern, an Open-Source Agent That Automates LLM Post-Training

Citations