Early-Stage Product Line Validation Using LLMs: A Study on Semi-Formal Blueprint Analysis
State-of-the-art reasoning models achieve 88-89% accuracy on feature model analysis operations, approaching solver-based FLAMA oracle and revealing systematic errors in structural parsing and constraint reasoning.
Excerpt
We study whether Large Language Models (LLMs) can perform feature model analysis operations (AOs) directly on semi-formal textual blueprints, i.e., concise constrained-language descriptions of feature hierarchies and constraints, enabling early validation in Software Product Line scoping. Using 12 state-of-the-art LLMs and 16 standard AOs, we compare their outputs against the solver-based oracle FLAMA. Results show that reasoning-optimized models (e.g., Grok 4 Fast Reasoning, Gemini 2.5 Pro) achieve 88-89% average accuracy across all evaluated blueprints and operations, approaching solver correctness. We identify systematic errors in structural parsing and constraint reasoning, and highlight accuracy-cost trade-offs that inform model selection. These findings position LLMs as lightweight assistants for early variability validation.
Read at source: https://arxiv.org/abs/2604.20523v1