4 research outputs found
Ranking LLM-Generated Loop Invariants for Program Verification
Synthesizing inductive loop invariants is fundamental to automating program
verification. In this work, we observe that Large Language Models (such as
gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of
programs in a 0-shot setting, yet require several samples to generate the
correct invariants. This can lead to a large number of calls to a program
verifier to establish an invariant. To address this issue, we propose a {\it
re-ranking} approach for the generated results of LLMs. We have designed a
ranker that can distinguish between correct inductive invariants and incorrect
attempts based on the problem definition. The ranker is optimized as a
contrastive ranker. Experimental results demonstrate that this re-ranking
mechanism significantly improves the ranking of correct invariants among the
generated candidates, leading to a notable reduction in the number of calls to
a verifier.Comment: Findings of The 2023 Conference on Empirical Methods in Natural
Language Processing (EMNLP-findings 2023
Finding Inductive Loop Invariants using Large Language Models
Loop invariants are fundamental to reasoning about programs with loops. They
establish properties about a given loop's behavior. When they additionally are
inductive, they become useful for the task of formal verification that seeks to
establish strong mathematical guarantees about program's runtime behavior. The
inductiveness ensures that the invariants can be checked locally without
consulting the entire program, thus are indispensable artifacts in a formal
proof of correctness. Finding inductive loop invariants is an undecidable
problem, and despite a long history of research towards practical solutions, it
remains far from a solved problem. This paper investigates the capabilities of
the Large Language Models (LLMs) in offering a new solution towards this old,
yet important problem. To that end, we first curate a dataset of verification
problems on programs with loops. Next, we design a prompt for exploiting LLMs,
obtaining inductive loop invariants, that are checked for correctness using
sound symbolic tools. Finally, we explore the effectiveness of using an
efficient combination of a symbolic tool and an LLM on our dataset and compare
it against a purely symbolic baseline. Our results demonstrate that LLMs can
help improve the state-of-the-art in automated program verification
Initial Embeddings for Neural Invariant Ranker
<p>These are the initial embeddings from `davinci-similarity` model for the <a href="https://arxiv.org/pdf/2310.09342.pdf" target="_blank" rel="noopener">Neural Invariant Ranker.</a></p>
<p> </p>
<p>The davinci.json file contains the embeddings from `davinci-similarity` model, and ada_002.json contains embeddings from `text-embedding-ada-002` model. </p>
Initial Embeddings for Neural Invariant Ranker (model davinci-similarity)
<p>These are the initial embeddings from `davinci-similarity` model for the <a href="https://arxiv.org/pdf/2310.09342.pdf" target="_blank" rel="noopener">Neural Invariant Ranker.</a></p>