159 research outputs found

    Program Merge Conflict Resolution via Neural Transformers

    Full text link
    Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code, a merge conflict may occur. Such conflicts stall pull requests and continuous integration pipelines for hours to several days, seriously hurting developer productivity. To address this problem, we introduce MergeBERT, a novel neural program merge framework based on token-level three-way differencing and a transformer encoder model. By exploiting the restricted nature of merge conflict resolutions, we reformulate the task of generating the resolution sequence as a classification task over a set of primitive merge patterns extracted from real-world merge commit data. Our model achieves 63-68% accuracy for merge resolution synthesis, yielding nearly a 3x performance improvement over existing semi-structured, and 2x improvement over neural program merge tools. Finally, we demonstrate that MergeBERT is sufficiently flexible to work with source code files in Java, JavaScript, TypeScript, and C# programming languages. To measure the practical use of MergeBERT, we conduct a user study to evaluate MergeBERT suggestions with 25 developers from large OSS projects on 122 real-world conflicts they encountered. Results suggest that in practice, MergeBERT resolutions would be accepted at a higher rate than estimated by automatic metrics for precision and accuracy. Additionally, we use participant feedback to identify future avenues for improvement of MergeBERT.Comment: ESEC/FSE '22 camera ready version. 12 pages, 4 figures, online appendi

    Static and dynamic semantics of NoSQL languages

    Get PDF
    We present a calculus for processing semistructured data that spans differences of application area among several novel query languages, broadly categorized as "NoSQL". This calculus lets users define their own operators, capturing a wider range of data processing capabilities, whilst providing a typing precision so far typical only of primitive hard-coded operators. The type inference algorithm is based on semantic type checking, resulting in type information that is both precise, and flexible enough to handle structured and semistructured data. We illustrate the use of this calculus by encoding a large fragment of Jaql, including operations and iterators over JSON, embedded SQL expressions, and co-grouping, and show how the encoding directly yields a typing discipline for Jaql as it is, namely without the addition of any type definition or type annotation in the code

    Schema Inference for Massive JSON Datasets

    Get PDF
    In the recent years JSON affirmed as a very popular data format for representing massive data collections. JSON data collections are usually schemaless. While this ensures sev- eral advantages, the absence of schema information has im- portant negative consequences: the correctness of complex queries and programs cannot be statically checked, users cannot rely on schema information to quickly figure out the structural properties that could speed up the formulation of correct queries, and many schema-based optimizations are not possible. In this paper we deal with the problem of inferring a schema from massive JSON datasets. We first identify a JSON type language which is simple and, at the same time, expressive enough to capture irregularities and to give com- plete structural information about input data. We then present our main contribution, which is the design of a schema inference algorithm, its theoretical study, and its implemen- tation based on Spark, enabling reasonable schema infer- ence time for massive collections. Finally, we report about an experimental analysis showing the effectiveness of our ap- proach in terms of execution time, precision, and conciseness of inferred schemas, and scalability

    Code merging using transformations and member identity

    Get PDF
    Conventionally, merging code files is performed using generic line-based merging algorithms (e.g., diff3) that are unaware of the syntax and semantics of the programming language, outputting conflicts that could be avoided. Structured and semistructured merging techniques are capable of reducing conflicts, but they still suffer from false positives (conflicts that could be avoided) and false negatives (conflicts that go undetected). We propose a merging technique that combines semistructured and transformation-based strategies, where conflict detection is aware of semantic aspects of the programming language. We extract transformations of two branches and apply a merging process that analyzes incompatible transformations, avoiding false positives and false negatives that occur in existing approaches. We developed Jaid, a prototype merging tool for Java based on the assumption that structural code elements evolve with attached UUIDs (representing identity). We performed an early experiment with 63 merge scenarios from two open-source projects to test the technique and assess its feasibility.info:eu-repo/semantics/publishedVersio

    A web-based database management system

    Get PDF
    This project develops a web database management system for Gfeller Center at the University of North Carolina at Chapel Hill. This system allows users to upload and download structured data in terms of specific requirements. The Gfeller Center works on research that studies the health condition of athletes. Researchers have collected a large amount of data from previous studies and they keep collecting data as the research goes on. The researchers need tools to help manage and organize the data and need advanced functionality to support their research. Therefore, this project helps to develop a web database system to manage data according to the requirements of the researchers in Gfeller Center. The system development includes requirements analysis, functionality analysis, and interface design. Finally, this project proposes a plan for usability testing to evaluate the system.Master of Science in Information Scienc

    Almost Rerere: Learning to resolve conflicts in distributed projects

    Get PDF
    The concurrent development of applications requires reconciling conflicting code updates by different developers. Recent research on the nature of merge conflicts in open source projects shows that a significant fraction of merge conflicts have limited size (one or two lines of code) and are resolved with simple strategies that use code present in the merged versions. Thus the opportunity arises of supporting the resolution of merge conflicts automatically by learning the way in which developers fix them. In this paper we propose a framework for automating the resolution of merge conflicts which learns from the resolutions made by developers and encodes such knowledge into conflict resolution rules applicable to conflicts not seen before. The proposed approach is text-based, does not depend on the programming languages of the merged files and exploits a well-known and general language (search and replacement regular expressions) to encode the conflict resolution rules. Evaluation results on 14,872 conflicts from 25 projects show that the system can synthesize a resolution for 49% of the conflicts occurred during the merge process (89% if one considers conflicts that have at least one similar conflict in the data set) and can reproduce exactly the same solution that human developers have applied in 55% of the cases (62% for single line conflicts)

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
    • …
    corecore