Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

By Dehai Min, Giovanni Vaccarino, Huiyi Chen, Yongliang Wu, Gal Yona

· HF Daily Papers · May 17, 2026

Semantic-preserving early exit mechanism for large reasoning models that detects reasoning convergence rather than just answer readiness, reducing wasted tokens from overthinking.

Categories: Research

Excerpt

Dehai Min, Giovanni Vaccarino, Huiyi Chen, Yongliang Wu, Gal Yona — Large Reasoning Models (LRMs) achieve strong performance by generating long chains of thought (CoT), but often overthink, continuing to reason after a solution has already stabilized and thereby wasting tokens and increasing latency. Existing inference-time early-exit methods rely primarily on answer-level signals, such as confidence or trial-answer consistency, to decide when to stop. However, these signals mainly reflect answer readiness rather than reasoning convergence: they may trigger before the model has finished exploring or self-correcting, causing premature exits that can degrade final-answer accuracy and leave the retained reasoning chain semantically incomplete. We identify reasoning-level semantic redundancy as a complementary signal for semantic-preserving early exit: when successive steps no longer add novel progress and instead revisit established conclusions, the reasoning trajectory has likely converged. Building on this insight, we propose PUMA, a plug-and-play framework that combines a lightweight Redundancy Detector with answer-level verification. The detector flags semantically redundant candidate exits, while verification confirms whether stopping is safe, allowing PUMA to remove redundant continuation while preserving both answer accuracy and a coherent reasoning prefix. Across five LRMs and five challenging reasoning benchmarks, PUMA achieves 26.2% average token reduction while preser

Read at source: https://arxiv.org/abs/2605.17672