Evaluating Cognitive Age Alignment in Interactive AI Agents
ChildAgentEval introduces the first psychometrically grounded benchmark for evaluating cognitive age alignment in MLLM-based agents using WISC-inspired tasks.
Excerpt
Yifan Shen, Jiawen Zhang, Jian Xu, Junho Kim, Ismini Lourentzou — While agentic AI and its core multimodal large language models (MLLMs) have demonstrated remarkable promise in language and visual reasoning across domains ranging from daily life to advanced scientific research, a profound gap remains between artificial and human intelligence. Despite the integration of powerful tools and advanced MLLMs, state-of-the-art AI agents frequently fail at foundational, seemingly simple tasks that a child can resolve with ease. Inspired by the Wechsler Intelligence Scale for Children (WISC), we introduce ChildAgentEval, the first psychometrically grounded interactive benchmark for evaluating cognitive age alignment in MLLM-based agents. ChildAgentEval systematically compares the reasoning performance of various MLLM-based interactive agents against age-specific human developmental stages, exposing where current agentic AI systems can and cannot simulate age-specific cognitive behavior.
Read at source: https://arxiv.org/abs/2605.17894