Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

Document-based question answering faces significant challenges when collections exceed LLM context limits. Rather than chunking documents and aggregating results—which creates bottlenecks—we introduce SLIDERS, a framework that extracts salient information into a relational database, enabling scalable reasoning over persistent structured state via SQL. Our approach includes data reconciliation to ensure consistency across extracted records. SLIDERS outperforms all baselines on three existing long-context benchmarks and demonstrates substantial improvements on larger datasets, exceeding GPT-4.1 by an average of 6.6 points and surpassing other baselines by 19-32 points on new benchmarks handling millions of tokens.