36 research outputs found
Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM
Testing plays a pivotal role in ensuring software quality, yet conventional
Search Based Software Testing (SBST) methods often struggle with complex
software units, achieving suboptimal test coverage. Recent works using large
language models (LLMs) for test generation have focused on improving generation
quality through optimizing the test generation context and correcting errors in
model outputs, but use fixed prompting strategies that prompt the model to
generate tests without additional guidance. As a result LLM-generated
testsuites still suffer from low coverage. In this paper, we present SymPrompt,
a code-aware prompting strategy for LLMs in test generation. SymPrompt's
approach is based on recent work that demonstrates LLMs can solve more complex
logical problems when prompted to reason about the problem in a multi-step
fashion. We apply this methodology to test generation by deconstructing the
testsuite generation process into a multi-stage sequence, each of which is
driven by a specific prompt aligned with the execution paths of the method
under test, and exposing relevant type and dependency focal context to the
model. Our approach enables pretrained LLMs to generate more complete test
cases without any additional training. We implement SymPrompt using the
TreeSitter parsing framework and evaluate on a benchmark challenging methods
from open source Python projects. SymPrompt enhances correct test generations
by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably,
when applied to GPT-4, SymPrompt improves coverage by over 2x compared to
baseline prompting strategies
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Code completion models have made significant progress in recent years, yet
current popular evaluation datasets, such as HumanEval and MBPP, predominantly
focus on code completion tasks within a single file. This over-simplified
setting falls short of representing the real-world software development
scenario where repositories span multiple files with numerous cross-file
dependencies, and accessing and understanding cross-file context is often
required to complete the code correctly.
To fill in this gap, we propose CrossCodeEval, a diverse and multilingual
code completion benchmark that necessitates an in-depth cross-file contextual
understanding to complete the code accurately. CrossCodeEval is built on a
diverse set of real-world, open-sourced, permissively-licensed repositories in
four popular programming languages: Python, Java, TypeScript, and C#. To create
examples that strictly require cross-file context for accurate completion, we
propose a straightforward yet efficient static-analysis-based approach to
pinpoint the use of cross-file context within the current file.
Extensive experiments on state-of-the-art code language models like CodeGen
and StarCoder demonstrate that CrossCodeEval is extremely challenging when the
relevant cross-file context is absent, and we see clear improvements when
adding these context into the prompt. However, despite such improvements, the
pinnacle of performance remains notably unattained even with the
highest-performing model, indicating that CrossCodeEval is also capable of
assessing model's capability in leveraging extensive context to make better
code completion. Finally, we benchmarked various methods in retrieving
cross-file context, and show that CrossCodeEval can also be used to measure the
capability of code retrievers.Comment: To appear at NeurIPS 2023 (Datasets and Benchmarks Track
Multi-agent modeling of the South Korean avian influenza epidemic
<p>Abstract</p> <p>Background</p> <p>Several highly pathogenic avian influenza (AI) outbreaks have been reported over the past decade. South Korea recently faced AI outbreaks whose economic impact was estimated to be 6.3 billion dollars, equivalent to nearly 50% of the profit generated by the poultry-related industries in 2008. In addition, AI is threatening to cause a human pandemic of potentially devastating proportions. Several studies show that a stochastic simulation model can be used to plan an efficient containment strategy on an emerging influenza. Efficient control of AI outbreaks based on such simulation studies could be an important strategy in minimizing its adverse economic and public health impacts.</p> <p>Methods</p> <p>We constructed a spatio-temporal multi-agent model of chickens and ducks in poultry farms in South Korea. The spatial domain, comprised of 76 (37.5 km × 37.5 km) unit squares, approximated the size and scale of South Korea. In this spatial domain, we introduced 3,039 poultry flocks (corresponding to 2,231 flocks of chickens and 808 flocks of ducks) whose spatial distribution was proportional to the number of birds in each province. The model parameterizes the properties and dynamic behaviors of birds in poultry farms and quarantine plans and included infection probability, incubation period, interactions among birds, and quarantine region.</p> <p>Results</p> <p>We conducted sensitivity analysis for the different parameters in the model. Our study shows that the quarantine plan with well-chosen values of parameters is critical for minimize loss of poultry flocks in an AI outbreak. Specifically, the aggressive culling plan of infected poultry farms over 18.75 km radius range is unlikely to be effective, resulting in higher fractions of unnecessarily culled poultry flocks and the weak culling plan is also unlikely to be effective, resulting in higher fractions of infected poultry flocks.</p> <p>Conclusions</p> <p>Our results show that a prepared response with targeted quarantine protocols would have a high probability of containing the disease. The containment plan with an aggressive culling plan is not necessarily efficient, causing a higher fraction of unnecessarily culled poultry farms. Instead, it is necessary to balance culling with other important factors involved in AI spreading. Better estimations for the containment of AI spreading with this model offer the potential to reduce the loss of poultry and minimize economic impact on the poultry industry.</p
Path-aware analysis of program invariants
Ensuring software reliability is a critical problem in the software development process. There are three overarching issues that help improve reliability of complex software systems: (a) availability of specifications that describe important invariants; (b) tools to identify when specifications are violated, and why these violations occur; and (c) the impact of modifications of programs on derived specifications. In this dissertation, we present scalable and efficient path-aware analyses that offer solutions to these three concerns and demonstrate how these solutions lead to improved software reliability. We develop a static path-aware analysis to infer specifications automatically from large software sources. We describe a static inference mechanism for identifying the preconditions that must hold whenever a procedure is called. These preconditions may reflect both dataflow properties (e.g., whenever p is called, variable x must be non-null) as well as control-flow properties (e.g., every call to p must be preceded by a call to q). We derive these preconditions using an inter-procedural path-aware dataflow analysis that gathers predicates at each program point. We apply mining techniques to these predicates to make specification inference robust with respect to errors. This technique also allows us to derive higher-level specifications that abstract structural similarities among predicates (e.g., procedure p is called immediately after a conditional test that checks whether some variable v is non-null). To identify those program statements that influence a specification or assertion, we develop a dynamic path-aware analysis that combines relevant information from multiple paths leading to an assertion point. This path information is encoded as a Boolean formula. The elements of this formula are derived from the predicates in conditional guards found on paths leading to an assertion point. These paths are generated from multiple dynamic runs that pass through the assertion. In addition to describing a test generation mechanism that drives execution through the assertion, we also present a novel representation scheme that coalesces paths using Binary Decision Diagrams (BDDs). Our representation thus allows effective pruning of redundant predicates. Finally, we present a novel solution to the general problem of understanding how specifications are influenced by revisions in program sources. When a revision, even a minor one, does occur, the changes it induces must be tested to ensure that invariants assumed in the original version are not violated unintentionally. In order to avoid testing components that are unchanged across revisions, impact analysis is often used to identify code blocks or functions that are affected by a change. Our approach employs dynamic programming on instrumented traces of different program binaries to identify longest common subsequences in strings generated by these traces. Our formulation allows us to perform impact analysis and also to detect the smallest set of locations within the functions where the effect of the changes actually manifests itself
Sieve: A Tool for Automatically Detecting Variations Across Program Versions
Revisions are an essential cllaracteristic of large-scale software development. Software systems often undergo many revisions during their lifetime because new features are added, bugs repaired, abstractions simplified and refactored, and performance improved. When a revision: even a minor one: does occur: the changes it induces must be tested to ensure that assumed invariants in the original are not violated. In order to avoid testing components that are unchanged across revisions: impact analysis is often used to identify those code blocks or functions that. are affected by a. change. In this paper, we present a new solution to this general problem that uses dynamic progranlming on inst.rumented traces of different program binaries to identify longest common subsequences in the strings generated by these traces. Our formulation not only allo\\~s 11s to perform impact analysis, but can also be used to detect the smallest set of locations within these functions where the effect of the changes actually manifest. Sieve is a tool that incorporates these ideas. Sieve is unobtrusive, requiring no programmer or compiler involvement to guide its behavior. We have tested Sieve on multiple versions of open-source C programs and find that the accuracy of impact analysis is improved by 10-30 % compared to existing state-of-the-art implementations. hlore significantly, Sieve can identify the regions \\here the changes manifest: and discovers that lor the vast majority of impacted lunctions: the locus of change is limited to often less than three lines of code. These results lead us to conclude that Sieve can play a beneficial role in program testing and software maintenance
Finding good peers in peer-to-peer networks
peer-to-peer networks, decentralized As computing and communication capabilities have continued to increase, more and more activity is taking place at the edges of the network, typically in homes or on workers desktops. This trend has been demonstrated by the increasing popularity and usability of &quot;peer-to-peer &quot; systems such as Napster and Gnutella. Unfortunately, this popularity has quickly shown the limitations of these systems, particularly in terms of scale. Because the networks form in an ad-hoc manner, they typically make inefficient use of resources. We propose a mechanism, using only local knowledge, to improve the overall performance of peer-to-peer networks based on interests. Peers monitor which other peers frequently respond successfully to their requests for information. When a peer is discovered to frequently provide good results, the peer attempts to move closer to it in the network by creating a new connection with that peer. This leads to clusters of peers with similar interests, and in turn allows us to limit the depth of searches required to find good results. We have implemented our algorithm in the context of a distributed encyclopedia-style information sharing application which is built on top of the gnutella network. In our testing environment, we have shown the ability to greatly reduce the amount of communication resources required to find the desired articles in the encyclopedia
Path-Sensitive Inference of Function Precedence Protocols
Function precedence protocols define ordering relations among function calls in a program. In some instances, precedence protocols are well-understood (e.g., a call to pthread mutex init must always be present on all program paths before a call to pthread mutex lock). Oftentimes, however, these protocols are neither welldocumented, nor easily derived. As a result, protocol violations can lead to subtle errors that are difficult to identify and correct. In this paper, we present CHRONICLER, a tool that applies scalable inter-procedural path-sensitive static analysis to automatically infer accurate function precedence protocols. CHRONICLER computes precedence relations based on a program’s control-flow structure, integrates these relations into a repository, and analyzes them using sequence mining techniques to generate a collection of feasible precedence protocols. Deviations from these protocols found in the program are tagged as violations, and represent potential sources of bugs. We demonstrate CHRONICLER’s effectiveness by deriving protocols for a collection of benchmarks ranging in size from 66K to 2M lines of code. Our results not only confirm the existence of bugs in these programs due to precedence protocol violations, but also highlight the importance of path sensitivity on accuracy and scalability
Trace Driven Dynamic Deadlock Detection and Reproduction
Dynamic analysis techniques have been proposed to detect potential deadlocks. Analyzing and comprehending each potential deadlock to determine whether the deadlock is feasible in a real execution requires significant programmer effort. Moreover, empirical evidence shows that existing analyses are quite imprecise. This imprecision of the analyses further void the manual effort invested in reasoning about non-existent defects. In this paper, we address the problems of imprecision of existing analyses and the subsequent manual effort necessary to reason about deadlocks. We propose a novel approach for deadlock detection by designing a dynamic analysis that intelligently leverages execution traces. To reduce the manual effort, we replay the program by making the execution follow a schedule derived based on the observed trace. For a real deadlock, its feasibility is automatically verified if the replay causes the execution to deadlock. We have implemented our approach as part of WOLF and have analyzed many large (upto 160KLoC) Java programs. Our experimental results show that we are able to identify 74% of the reported defects as true (or false) positives automatically leaving very few defects for manual analysis. The overhead of our approach is negligible making it a compelling tool for practical adoption