Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers

By Haoyu Wang, Yuliang Song, Tao Li, Zhiwei Deng, Yaqing Wang

· ArXiv · AI/CL/LG · May 12, 2026

CP-SynC-XL benchmark of 100 combinatorial problems finds Python + OR-Tools attains highest correctness across LLMs, challenging the assumption that declarative constraint modeling improves results.

Categories: Research

Excerpt

Large Language Models (LLMs) struggle to solve complex combinatorial problems through direct reasoning, so recent neuro-symbolic systems increasingly use them to synthesize executable solvers. A central design question is how the LLM should represent the solver, and whether it should also attempt to optimize search. We introduce CP-SynC-XL, a benchmark of 100 combinatorial problems (4,577 instances), and evaluate three solver-construction paradigms: native algorithmic search (Python), constraint modeling through a Python solver API (Python + OR-Tools), and declarative constraint modeling (MiniZinc + OR-Tools). We find a consistent representational divergence: Python + OR-Tools attains the highest correctness across LLMs, while MiniZinc + OR-Tools has lower absolute coverage despite using the same OR-Tools back-end. Native Python is the most likely to return a schema-valid solution that fails verification, whereas solver-backed paths preserve higher conditional fidelity. On the heuristic axis, prompting for search optimization yields only small median speed-ups (1.03-1.12x) and a strongly bimodal effect: many instances slow down, and correctness drops sharply on a long tail of problems. A paired code-level audit traces these regressions to a recurring heuristic trap. Under an efficiency-oriented prompt, the LLM may replace complete search with local approximations (Python), inject unverified bounds (Python + OR-Tools), or add redundant declarative machinery that overwhelms or

Read at source: https://arxiv.org/abs/2605.12421v1