The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems

· ArXiv · AI/CL/LG ·

Large-scale experiments reveal embedding-based state representation variations in contextual multi-armed bandits significantly impact recommender system performance, with matrix factorization-derived representations strongly influencing model decisions.

Categories: Research

Excerpt

With the increasing availability of online information, recommender systems have become an important tool for many web-based systems. Due to the continuous aspect of recommendation environments, these systems increasingly rely on contextual multi-armed bandits (CMAB) to deliver personalized and real-time suggestions. A critical yet underexplored component in these systems is the representation of user state, which typically encapsulates the user's interaction history and is deeply correlated with the model's decisions and learning. In this paper, we investigate the impact of different embedding-based state representations derived from matrix factorization models on the performance of traditional CMAB algorithms. Our large-scale experiments reveal that variations in state representation can lead to improvements greater than those achieved by changing the bandit algorithm itself. Furthermore, no single embedding or aggregation strategy consistently dominates across datasets, underscoring the need for domain-specific evaluation. These results expose a substantial gap in the literature and emphasize that advancing bandit-based recommender systems requires a holistic approach that prioritizes embedding quality and state construction alongside algorithmic innovation. The source code for our experiments is publicly available on https://github.com/UFSCar-LaSID/bandits_blind_spot.