RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

By Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta

· ArXiv · AI/CL/LG · May 11, 2026

RUBEN discovers minimal rules explaining RAG-LLM outputs via novel pruning, with applications to safety training resiliency and adversarial prompt injection testing.

Categories: Research

Excerpt

This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.

Read at source: https://arxiv.org/abs/2605.10862v1