back

arXiv:2605.06638: ScaleLogic Framework Shows RL Training Compute Follows Power Law With Reasoning Depth; Scaling Exponent Rises from 1.04 to 2.60 as Logical Expressiveness Increases

today 11:05

Wang et al. (May 7, 2026) introduce ScaleLogic, a synthetic logical reasoning environment with independent control over proof depth and logical expressiveness. RL training compute follows T∝D^γ with R²>0.99, where γ ranges from 1.04 for implication-only logic to 2.60 for full first-order logic with conjunction, disjunction, negation, and universal quantification. More expressive training settings yield up to +10.66 points downstream transfer gain and better compute efficiency, indicating long-horizon reasoning is addressable through training methodology rather than architecture.

Citations