GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs
GeoStack composes independently trained VLM domain experts using geometric constraints on the adapter manifold, preserving base model knowledge while achieving O(1) inference regardless of expert count.
Excerpt
Pranav Mantini, Shishir K. Shah — We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate a weight-folding property that achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results across multi-domain adaptation and class-incremental learning show that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigating catastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.
Read at source: https://arxiv.org/abs/2605.06477