The Scaling Properties of Implicit Deductive Reasoning in Transformers
Deep bidirectional Transformers can approach explicit chain-of-thought performance on Horn clause deduction when algorithmic alignment is enforced, though CoT remains needed for depth extrapolation.
Excerpt
Enrico Vompa, Tanel Tammet — We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.
Read at source: https://arxiv.org/abs/2605.04330