887 research outputs found

    FixMiner: Mining Relevant Fix Patterns for Automated Program Repair

    Get PDF
    Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across projects, the associated similar patches can be leveraged to extract generic fix actions. While the literature includes various approaches leveraging similarity among patches to guide program repair, these approaches often do not yield fix patterns that are tractable and reusable as actionable input to APR systems. In this paper, we propose a systematic and automated approach to mining relevant and actionable fix patterns based on an iterative clustering strategy applied to atomic changes within patches. The goal of FixMiner is thus to infer separate and reusable fix patterns that can be leveraged in other patch generation systems. Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree structure of the edit scripts that captures the AST-level context of the code changes. FixMiner uses different tree representations of Rich Edit Scripts for each round of clustering to identify similar changes. These are abstract syntax trees, edit actions trees, and code context trees. We have evaluated FixMiner on thousands of software patches collected from open source projects. Preliminary results show that we are able to mine accurate patterns, efficiently exploiting change information in Rich Edit Scripts. We further integrated the mined patterns to an automated program repair prototype, PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J benchmark. Beyond this quantitative performance, we show that the mined fix patterns are sufficiently relevant to produce patches with a high probability of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure

    Dynamic Analysis can be Improved with Automatic Test Suite Refactoring

    Full text link
    Context: Developers design test suites to automatically verify that software meets its expected behaviors. Many dynamic analysis techniques are performed on the exploitation of execution traces from test cases. However, in practice, there is only one trace that results from the execution of one manually-written test case. Objective: In this paper, we propose a new technique of test suite refactoring, called B-Refactoring. The idea behind B-Refactoring is to split a test case into small test fragments, which cover a simpler part of the control flow to provide better support for dynamic analysis. Method: For a given dynamic analysis technique, our test suite refactoring approach monitors the execution of test cases and identifies small test cases without loss of the test ability. We apply B-Refactoring to assist two existing analysis tasks: automatic repair of if-statements bugs and automatic analysis of exception contracts. Results: Experimental results show that test suite refactoring can effectively simplify the execution traces of the test suite. Three real-world bugs that could previously not be fixed with the original test suite are fixed after applying B-Refactoring; meanwhile, exception contracts are better verified via applying B-Refactoring to original test suites. Conclusions: We conclude that applying B-Refactoring can effectively improve the purity of test cases. Existing dynamic analysis tasks can be enhanced by test suite refactoring

    Using Word Embedding and Convolution Neural Network for Bug Triaging by Considering Design Flaws

    Full text link
    Resolving bugs in the maintenance phase of software is a complicated task. Bug assignment is one of the main tasks for resolving bugs. Some Bugs cannot be fixed properly without making design decisions and have to be assigned to designers, rather than programmers, to avoid emerging bad smells that may cause subsequent bug reports. Hence, it is important to refer some bugs to the designer to check the possible design flaws. Based on our best knowledge, there are a few works that have considered referring bugs to designers. Hence, this issue is considered in this work. In this paper, a dataset is created, and a CNN-based model is proposed to predict the need for assigning a bug to a designer by learning the peculiarities of bug reports effective in creating bad smells in the code. The features of each bug are extracted from CNN based on its textual features, such as a summary and description. The number of bad samples added to it in the fixing process using the PMD tool determines the bug tag. The summary and description of the new bug are given to the model and the model predicts the need to refer to the designer. The accuracy of 75% (or more) was achieved for datasets with a sufficient number of samples for deep learning-based model training. A model is proposed to predict bug referrals to the designer. The efficiency of the model in predicting referrals to the designer at the time of receiving the bug report was demonstrated by testing the model on 10 projects

    Large Language Models for Software Engineering: A Systematic Literature Review

    Full text link
    Large Language Models (LLMs) have significantly impacted numerous domains, notably including Software Engineering (SE). Nevertheless, a well-rounded understanding of the application, effects, and possible limitations of LLMs within SE is still in its early stages. To bridge this gap, our systematic literature review takes a deep dive into the intersection of LLMs and SE, with a particular focus on understanding how LLMs can be exploited in SE to optimize processes and outcomes. Through a comprehensive review approach, we collect and analyze a total of 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize and provide a comparative analysis of different LLMs that have been employed in SE tasks, laying out their distinctive features and uses. For RQ2, we detail the methods involved in data collection, preprocessing, and application in this realm, shedding light on the critical role of robust, well-curated datasets for successful LLM implementation. RQ3 allows us to examine the specific SE tasks where LLMs have shown remarkable success, illuminating their practical contributions to the field. Finally, RQ4 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE, as well as the common techniques related to prompt optimization. Armed with insights drawn from addressing the aforementioned RQs, we sketch a picture of the current state-of-the-art, pinpointing trends, identifying gaps in existing research, and flagging promising areas for future study

    Back to the Future! Studying Data Cleanness in Defects4J and its Impact on Fault Localization

    Full text link
    For software testing research, Defects4J stands out as the primary benchmark dataset, offering a controlled environment to study real bugs from prominent open-source systems. However, prior research indicates that Defects4J might include tests added post-bug report, embedding developer knowledge and affecting fault localization efficacy. In this paper, we examine Defects4J's fault-triggering tests, emphasizing the implications of developer knowledge of SBFL techniques. We study the timelines of changes made to these tests concerning bug report creation. Then, we study the effectiveness of SBFL techniques without developer knowledge in the tests. We found that 1) 55% of the fault-triggering tests were newly added to replicate the bug or to test for regression; 2) 22% of the fault-triggering tests were modified after the bug reports were created, containing developer knowledge of the bug; 3) developers often modify the tests to include new assertions or change the test code to reflect the changes in the source code; and 4) the performance of SBFL techniques degrades significantly (up to --415% for Mean First Rank) when evaluated on the bugs without developer knowledge. We provide a dataset of bugs without developer insights, aiding future SBFL evaluations in Defects4J and informing considerations for future bug benchmarks

    Utilizing traceable software artifacts to improve bug localization

    Get PDF
    Die Entwicklung von Softwaresystemen ist eine komplexe Aufgabe. Qualitätssicherung versucht auftretenden Softwarefehler (bugs) in Systemen zu vermeiden, jedoch können Fehler nie ausgeschlossen werden. Sobald ein Softwarefehler entdeckt wird, wird typischerweise ein Fehlerbericht (bug report) erstellt. Dieser dient als Ausgangspunkt für den Entwickler den Fehler im Quellcode der Software zu finden und zu beheben (bug fixing). Fehlerberichte sowie weitere Softwareartefakte, z.B. Anforderungen und der Quellcode selbst, werden in Software Repositories abgelegt. Diese erlauben die Artefakte mit trace links zur Nachvollziehbarkeit (traceability) zu verknüpfen. Oftmals ist die Erstellung der trace links im Entwicklungsprozess vorgeschrieben. Dazu zählen u.a. die Luftfahrt- und Automobilindustrie, sowie die Entwicklung von medizinischen Geräten. Das Auffinden von Softwarefehlern in großen Systemen mit tausenden Artefakten ist eine anspruchsvolle, zeitintensive und fehleranfällige Aufgabe, welche eine umfangreiche Projektkenntnis erfordert. Deswegen wird seit Jahren aktiv an der Automatisierung dieses Prozesses geforscht. Weiterhin wird die manuelle Erstellung und Pflege von trace links als Belastung empfunden und sollte weitgehend automatisiert werden. In dieser Arbeit wird ein neuartiger Algorithmus zum Auffinden von Softwarefehlern vorgestellt, der aktiv die erstellten trace links ausnutzt. Die Artefakte und deren Beziehungen dienen zur Erstellung eines Nachvollziehbarkeitsgraphen, welcher analysiert wird um fehlerhafte Quellcodedateien anhand eines Fehlerberichtes zu finden. Jedoch muss angenommen werden, dass nicht alle notwendigen trace links zwischen den Softwareartefakten eines Projektes erstellt wurden. Deswegen wird ein vollautomatisierter, projektunabhängiger Ansatz vorgestellt, der diese fehlenden trace links erstellt (augmentation). Die Grundlage zur Entwicklung dieses Algorithmus ist der typische Entwicklungsprozess eines Softwareprojektes. Die entwickelten Ansätze wurden mit mehr als 32.000 Fehlerberichten von 27 Open-Source Projekten evaluiert und die Ergebnisse zeigen, dass die Einbeziehung von traceability signifikant das Auffinden von Fehlern im Quellcode verbessert. Weiterhin kann der entwickelte Augmentation Algorithmus zuverlässig fehlende trace links erstellen.The development of software systems is a very complex task. Quality assurance tries to prevent defects – software bugs – in deployed systems, but it is impossible to avoid bugs all together, especially during development. Once a bug is observed, typically a bug report is written. It guides the responsible developer to locate the bug in the project's source code, and once found to fix it. The bug reports, along with other development artifacts such as requirements and the source code are stored in software repositories. The repositories also allow to create relationships – trace links – among contained artifacts. Establishing this traceability is demanded in many domains, such as safety related ones like the automotive and aviation industry, or in development of medical devices. However, in large software systems with thousands of artifacts, especially source code files, manually locating a bug is time consuming, error-prone, and requires extensive knowledge of the project. Thus, automating the bug localization process is actively researched since many years. Further, manually creating and maintaining trace links is often considered as a burden, and there is the need to automate this task as well. Multiple studies have shown, that traceability is beneficial for many software development tasks. This thesis presents a novel bug localization algorithm utilizing traceability. The project's artifacts and trace links are used to create a traceability graph. This graph is then analyzed to locate defective source code files for a given bug report. Since the existing trace link set of a project is possibly incomplete, another algorithm is prosed to augment missing links. The algorithm is fully automated, project independent, and derived from a project's development workflow. An evaluation on more than 32,000 bug reports from 27 open-source projects shows, that incorporating traceability information into bug localization significantly improves the bug localization performance compared to two state of the art algorithms. Further, the trace link augmentation approach reliably constructs missing links and therefore simplifies the required trace maintenance
    • …
    corecore