35 research outputs found

    CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion

    Full text link
    Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single file. This over-simplified setting falls short of representing the real-world software development scenario where repositories span multiple files with numerous cross-file dependencies, and accessing and understanding cross-file context is often required to complete the code correctly. To fill in this gap, we propose CrossCodeEval, a diverse and multilingual code completion benchmark that necessitates an in-depth cross-file contextual understanding to complete the code accurately. CrossCodeEval is built on a diverse set of real-world, open-sourced, permissively-licensed repositories in four popular programming languages: Python, Java, TypeScript, and C#. To create examples that strictly require cross-file context for accurate completion, we propose a straightforward yet efficient static-analysis-based approach to pinpoint the use of cross-file context within the current file. Extensive experiments on state-of-the-art code language models like CodeGen and StarCoder demonstrate that CrossCodeEval is extremely challenging when the relevant cross-file context is absent, and we see clear improvements when adding these context into the prompt. However, despite such improvements, the pinnacle of performance remains notably unattained even with the highest-performing model, indicating that CrossCodeEval is also capable of assessing model's capability in leveraging extensive context to make better code completion. Finally, we benchmarked various methods in retrieving cross-file context, and show that CrossCodeEval can also be used to measure the capability of code retrievers.Comment: To appear at NeurIPS 2023 (Datasets and Benchmarks Track

    Multi-agent modeling of the South Korean avian influenza epidemic

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several highly pathogenic avian influenza (AI) outbreaks have been reported over the past decade. South Korea recently faced AI outbreaks whose economic impact was estimated to be 6.3 billion dollars, equivalent to nearly 50% of the profit generated by the poultry-related industries in 2008. In addition, AI is threatening to cause a human pandemic of potentially devastating proportions. Several studies show that a stochastic simulation model can be used to plan an efficient containment strategy on an emerging influenza. Efficient control of AI outbreaks based on such simulation studies could be an important strategy in minimizing its adverse economic and public health impacts.</p> <p>Methods</p> <p>We constructed a spatio-temporal multi-agent model of chickens and ducks in poultry farms in South Korea. The spatial domain, comprised of 76 (37.5 km × 37.5 km) unit squares, approximated the size and scale of South Korea. In this spatial domain, we introduced 3,039 poultry flocks (corresponding to 2,231 flocks of chickens and 808 flocks of ducks) whose spatial distribution was proportional to the number of birds in each province. The model parameterizes the properties and dynamic behaviors of birds in poultry farms and quarantine plans and included infection probability, incubation period, interactions among birds, and quarantine region.</p> <p>Results</p> <p>We conducted sensitivity analysis for the different parameters in the model. Our study shows that the quarantine plan with well-chosen values of parameters is critical for minimize loss of poultry flocks in an AI outbreak. Specifically, the aggressive culling plan of infected poultry farms over 18.75 km radius range is unlikely to be effective, resulting in higher fractions of unnecessarily culled poultry flocks and the weak culling plan is also unlikely to be effective, resulting in higher fractions of infected poultry flocks.</p> <p>Conclusions</p> <p>Our results show that a prepared response with targeted quarantine protocols would have a high probability of containing the disease. The containment plan with an aggressive culling plan is not necessarily efficient, causing a higher fraction of unnecessarily culled poultry farms. Instead, it is necessary to balance culling with other important factors involved in AI spreading. Better estimations for the containment of AI spreading with this model offer the potential to reduce the loss of poultry and minimize economic impact on the poultry industry.</p

    Path-aware analysis of program invariants

    No full text
    Ensuring software reliability is a critical problem in the software development process. There are three overarching issues that help improve reliability of complex software systems: (a) availability of specifications that describe important invariants; (b) tools to identify when specifications are violated, and why these violations occur; and (c) the impact of modifications of programs on derived specifications. In this dissertation, we present scalable and efficient path-aware analyses that offer solutions to these three concerns and demonstrate how these solutions lead to improved software reliability. We develop a static path-aware analysis to infer specifications automatically from large software sources. We describe a static inference mechanism for identifying the preconditions that must hold whenever a procedure is called. These preconditions may reflect both dataflow properties (e.g., whenever p is called, variable x must be non-null) as well as control-flow properties (e.g., every call to p must be preceded by a call to q). We derive these preconditions using an inter-procedural path-aware dataflow analysis that gathers predicates at each program point. We apply mining techniques to these predicates to make specification inference robust with respect to errors. This technique also allows us to derive higher-level specifications that abstract structural similarities among predicates (e.g., procedure p is called immediately after a conditional test that checks whether some variable v is non-null). To identify those program statements that influence a specification or assertion, we develop a dynamic path-aware analysis that combines relevant information from multiple paths leading to an assertion point. This path information is encoded as a Boolean formula. The elements of this formula are derived from the predicates in conditional guards found on paths leading to an assertion point. These paths are generated from multiple dynamic runs that pass through the assertion. In addition to describing a test generation mechanism that drives execution through the assertion, we also present a novel representation scheme that coalesces paths using Binary Decision Diagrams (BDDs). Our representation thus allows effective pruning of redundant predicates. Finally, we present a novel solution to the general problem of understanding how specifications are influenced by revisions in program sources. When a revision, even a minor one, does occur, the changes it induces must be tested to ensure that invariants assumed in the original version are not violated unintentionally. In order to avoid testing components that are unchanged across revisions, impact analysis is often used to identify code blocks or functions that are affected by a change. Our approach employs dynamic programming on instrumented traces of different program binaries to identify longest common subsequences in strings generated by these traces. Our formulation allows us to perform impact analysis and also to detect the smallest set of locations within the functions where the effect of the changes actually manifests itself

    Sieve: A Tool for Automatically Detecting Variations Across Program Versions

    Get PDF
    Revisions are an essential cllaracteristic of large-scale software development. Software systems often undergo many revisions during their lifetime because new features are added, bugs repaired, abstractions simplified and refactored, and performance improved. When a revision: even a minor one: does occur: the changes it induces must be tested to ensure that assumed invariants in the original are not violated. In order to avoid testing components that are unchanged across revisions: impact analysis is often used to identify those code blocks or functions that. are affected by a. change. In this paper, we present a new solution to this general problem that uses dynamic progranlming on inst.rumented traces of different program binaries to identify longest common subsequences in the strings generated by these traces. Our formulation not only allo\\~s 11s to perform impact analysis, but can also be used to detect the smallest set of locations within these functions where the effect of the changes actually manifest. Sieve is a tool that incorporates these ideas. Sieve is unobtrusive, requiring no programmer or compiler involvement to guide its behavior. We have tested Sieve on multiple versions of open-source C programs and find that the accuracy of impact analysis is improved by 10-30 % compared to existing state-of-the-art implementations. hlore significantly, Sieve can identify the regions \\here the changes manifest: and discovers that lor the vast majority of impacted lunctions: the locus of change is limited to often less than three lines of code. These results lead us to conclude that Sieve can play a beneficial role in program testing and software maintenance

    Finding good peers in peer-to-peer networks

    No full text
    peer-to-peer networks, decentralized As computing and communication capabilities have continued to increase, more and more activity is taking place at the edges of the network, typically in homes or on workers desktops. This trend has been demonstrated by the increasing popularity and usability of &amp;quot;peer-to-peer &amp;quot; systems such as Napster and Gnutella. Unfortunately, this popularity has quickly shown the limitations of these systems, particularly in terms of scale. Because the networks form in an ad-hoc manner, they typically make inefficient use of resources. We propose a mechanism, using only local knowledge, to improve the overall performance of peer-to-peer networks based on interests. Peers monitor which other peers frequently respond successfully to their requests for information. When a peer is discovered to frequently provide good results, the peer attempts to move closer to it in the network by creating a new connection with that peer. This leads to clusters of peers with similar interests, and in turn allows us to limit the depth of searches required to find good results. We have implemented our algorithm in the context of a distributed encyclopedia-style information sharing application which is built on top of the gnutella network. In our testing environment, we have shown the ability to greatly reduce the amount of communication resources required to find the desired articles in the encyclopedia

    Trace Driven Dynamic Deadlock Detection and Reproduction

    No full text
    Dynamic analysis techniques have been proposed to detect potential deadlocks. Analyzing and comprehending each potential deadlock to determine whether the deadlock is feasible in a real execution requires significant programmer effort. Moreover, empirical evidence shows that existing analyses are quite imprecise. This imprecision of the analyses further void the manual effort invested in reasoning about non-existent defects. In this paper, we address the problems of imprecision of existing analyses and the subsequent manual effort necessary to reason about deadlocks. We propose a novel approach for deadlock detection by designing a dynamic analysis that intelligently leverages execution traces. To reduce the manual effort, we replay the program by making the execution follow a schedule derived based on the observed trace. For a real deadlock, its feasibility is automatically verified if the replay causes the execution to deadlock. We have implemented our approach as part of WOLF and have analyzed many large (upto 160KLoC) Java programs. Our experimental results show that we are able to identify 74% of the reported defects as true (or false) positives automatically leaving very few defects for manual analysis. The overhead of our approach is negligible making it a compelling tool for practical adoption

    Path-Sensitive Inference of Function Precedence Protocols

    Get PDF
    Function precedence protocols define ordering relations among function calls in a program. In some instances, precedence protocols are well-understood (e.g., a call to pthread mutex init must always be present on all program paths before a call to pthread mutex lock). Oftentimes, however, these protocols are neither welldocumented, nor easily derived. As a result, protocol violations can lead to subtle errors that are difficult to identify and correct. In this paper, we present CHRONICLER, a tool that applies scalable inter-procedural path-sensitive static analysis to automatically infer accurate function precedence protocols. CHRONICLER computes precedence relations based on a program’s control-flow structure, integrates these relations into a repository, and analyzes them using sequence mining techniques to generate a collection of feasible precedence protocols. Deviations from these protocols found in the program are tagged as violations, and represent potential sources of bugs. We demonstrate CHRONICLER’s effectiveness by deriving protocols for a collection of benchmarks ranging in size from 66K to 2M lines of code. Our results not only confirm the existence of bugs in these programs due to precedence protocol violations, but also highlight the importance of path sensitivity on accuracy and scalability

    Deterministic Dynamic Race Detection Across Program Versions

    No full text
    Dynamic race detectors operate by analyzing execution traces of programs to detect races in multithreaded programs. As the thread interleavings influence these traces, the sets of races detected across multiple runs of the detector can vary. This non-determinism without any change in program source and input can reduce programmer confidence in using the detector. From an organizational perspective, a defect needs to be reported consistently until it is fixed. Non-determinism complicates the work flow and the problem is further exacerbated with modifications to the program. In this paper, we propose a framework for deterministic dynamic race detection that ensures detection of races until they are fixed, even across program versions. The design attempts to preserve the racy behavior with changes to the program source that include addition (and deletion) of locks and shared memory accesses. We record, transform and replay the schedules across program versions intelligently to achieve this goal. We have implemented a framework, named STABLER, and evaluated our ideas by applying popular race detectors (DJIT+, FastTrack) on different versions of many open-source multithreaded Java programs. Our experimental results show that we are able to detect all the unfixed races consistently across major releases of the program. For both the detectors, the maximum incurred slowdown, with our framework, for record and replay is 1.2x and 2.29x respectively. We also perform user experiments where volunteers fixed a significant number of races. In spite of these changes, our framework is effective in its ability to detect all the unfixed races
    corecore