Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors
Published in TOPLAS, will be presented on OOPSLA, 2026
LLMs can produce loop invariants with correct high-level insights but invalid details due to small logical reasoning errors. LORIS builds a pipeline that first asks the LLM to generate reasoning steps and then formalizes and checks those steps with a verifier to identify the failing reasoning. By providing detailed feedback on the failure, LORIS guides the LLM toward valid loop invariants more effectively.
On a benchmark suite of 460 C programs, LORIS with GPT-4.1 solves 445 programs with a 93.1% success rate, outperforming the SOTA approaches LaM4Inv and Clause2Inv, both of which remain below 85% under the same model. Across five tested models, LORIS solves the most unique programs on all models and achieves higher success rates on four models. On an additional non-linear benchmark suite, LORIS solves 47 of 50 programs.
