Planning in entropy-regularized Markov decision processes and games

By Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko

· ArXiv · AI/CL/LG · Apr 21, 2026

Entropy-regularized MDP planning algorithm SmoothCruiser achieves O~(1/ε⁴) sample complexity, providing polynomial guarantees where non-regularized settings lack them.

Categories: Research

Excerpt

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Read at source: https://arxiv.org/abs/2604.19695v1