708 research outputs found

    FixMiner: Mining Relevant Fix Patterns for Automated Program Repair

    Get PDF
    Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across projects, the associated similar patches can be leveraged to extract generic fix actions. While the literature includes various approaches leveraging similarity among patches to guide program repair, these approaches often do not yield fix patterns that are tractable and reusable as actionable input to APR systems. In this paper, we propose a systematic and automated approach to mining relevant and actionable fix patterns based on an iterative clustering strategy applied to atomic changes within patches. The goal of FixMiner is thus to infer separate and reusable fix patterns that can be leveraged in other patch generation systems. Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree structure of the edit scripts that captures the AST-level context of the code changes. FixMiner uses different tree representations of Rich Edit Scripts for each round of clustering to identify similar changes. These are abstract syntax trees, edit actions trees, and code context trees. We have evaluated FixMiner on thousands of software patches collected from open source projects. Preliminary results show that we are able to mine accurate patterns, efficiently exploiting change information in Rich Edit Scripts. We further integrated the mined patterns to an automated program repair prototype, PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J benchmark. Beyond this quantitative performance, we show that the mined fix patterns are sufficiently relevant to produce patches with a high probability of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure

    Mining Fix Patterns for FindBugs Violations

    Get PDF
    In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.Comment: Accepted for IEEE Transactions on Software Engineerin

    A Survey of Learning-based Automated Program Repair

    Full text link
    Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing number of APR techniques have been proposed to leverage neural networks to learn bug-fixing patterns from massive open-source code repositories. Such learning-based techniques usually treat APR as a neural machine translation (NMT) task, where buggy code snippets (i.e., source language) are translated into fixed code snippets (i.e., target language) automatically. Benefiting from the powerful capability of DL to learn hidden relationships from previous bug-fixing datasets, learning-based APR techniques have achieved remarkable performance. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the learning-based APR community. We illustrate the general workflow of learning-based APR techniques and detail the crucial components, including fault localization, patch generation, patch ranking, patch validation, and patch correctness phases. We then discuss the widely-adopted datasets and evaluation metrics and outline existing empirical studies. We discuss several critical aspects of learning-based APR techniques, such as repair domains, industrial deployment, and the open science issue. We highlight several practical guidelines on applying DL techniques for future APR studies, such as exploring explainable patch generation and utilizing code features. Overall, our paper can help researchers gain a comprehensive understanding about the achievements of the existing learning-based APR techniques and promote the practical application of these techniques. Our artifacts are publicly available at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}

    Identifying evidences of computer programming skills through automatic source code evaluation

    Get PDF
    Orientador: Roberto PereiraCoorientador: Eleandro MaschioTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 27/03/2020Inclui referências: p. 98-106Área de concentração: Ciência da ComputaçãoResumo: Esta tese e contextualizada no ensino de programacao de computadores em cursos de Computacao e investiga aspectos e estrategias para avaliacao automatica e continua de codigos fonte desenvolvidos pelos alunos. O estado da arte foi identificado por meio de revisao sistematica de literatura e revelou que as pesquisas anteriores tendem a realizar avaliacoes baseadas em aspectos tecnicos de codigos fonte, como a avaliacao de corretude funcional e a deteccao de erros. Avaliacoes baseadas em habilidades, por outro lado, sao pouco exploradas e possuem potencial para fornecer detalhes a respeito de habilidades representadas por conceitos de alto nivel, como desvios condicionais e estruturas de repeticao. Um metodo de identificacao automatica de evidencias de aprendizado e entao proposto como uma abordagem baseada em habilidades para a avaliacao automatica de codigos fonte de programacao. O metodo e caracterizado pela implementacao de diferentes estrategias para avaliacao de codigos fonte, identificacao de evidencias de habilidades de programacao, e representacao destas habilidades em um modelo do aluno. Experimentos realizados em ambientes controlados (bases de dados artificiais) mostraram que estrategias automaticas de avaliacao de codigo fonte sao viaveis. Experimentos conduzidos em ambientes reais (codigos fonte produzidos por alunos) produziram resultados semelhantes aos ambientes controlados, entretanto revelaram limitacoes relacionadas a implementacao das estrategias, como vulnerabilidades a sintaxes inesperadas e falhas em expressoes regulares. Um conjunto de habilidades foi selecionado para compor o modelo do aluno, representado por uma rede bayesiana dinamica. Por meio de experimentos foi demonstrado que a alimentacao do modelo com evidencias resultantes da avaliacao automatica de codigos fonte permite o acompanhamento do progresso das habilidades dos alunos. Finalmente, as estrategias automaticas em conjunto com os recursos do modelo do aluno permitiram a demonstracao da avaliacao baseada em habilidades, que se mostrou um recurso valioso para identificacao de solucoes funcionalmente corretas, porem conceitualmente incorretas; quando o programa e funcionalmente correto, retornando resultados esperados a determinadas entradas, porem foi construido com recursos e conceitos incorretos. Palavras-chave: Programacao de Computadores, Avaliacao Automatica, Avaliacao Baseada em HabilidadesAbstract: This thesis is contextualized in the teaching of computer programming in Computing courses and investigates aspects and strategies for automatic and continuous evaluation of student developed source codes. The state of the art was identified through systematic literature review and revealed previous research tends to perform evaluations based on source codes technical aspects, such as functional correctness assessment and error detection. Skills-based assessments, in turn, are less explored although having potential to provide details of skills represented by high-level concepts, such as conditionals and repetition structures. A method for automatic identification of learning evidences is then proposed as a skills-based approach to automatic evaluation of programming source codes. The method is characterized by implementing different strategies for source code evaluation, identifying evidences of programming skills, and representing these skills in a student model. Experiments conducted in controlled scenarios (testing datasets) have shown automatic source code evaluation strategies are viable. Experiments conducted in real scenarios (student-made source codes) produced results similar to controlled scenarios, however, implementation-related limitations were revealed for some strategies, such as vulnerabilities to unexpected syntax and flaws in regular expressions. A skill set was selected to compose our student model, represented by a Dynamic Bayesian Network. Experiments have shown feeding the model with evidences resulting from source codes automatic evaluation allows monitoring students' skills progress. Finally, automatic strategies coupled with student model capabilities enabled demonstrating skills-based assessment, which showed a valuable resource for identifying functionally correct source codes, but conceptually incorrect; when a program is correct functionally, returning expected results to specific inputs, but it was built with erroneous concepts and resources. Keywords: Computer Programming, Automatic Evaluation, Skills-Based Assessmen

    Hint generation in programming tutors

    Get PDF
    Programming is increasingly recognized as a useful and important skill. Online programming courses that have appeared in the past decade have proven extremely popular with a wide audience. Learning in such courses is however not as effective as working directly with a teacher, who can provide students with immediate relevant feedback. The field of intelligent tutoring systems seeks to provide such feedback automatically. Traditionally, tutors have depended on a domain model defined by the teacher in advance. Creating such a model is a difficult task that requires a lot of knowledgeengineering effort, especially in complex domains such as programming. A potential solution to this problem is to use data-driven methods. The idea is to build the domain model by observing how students have solved an exercise in the past. New students can then be given feedback that directs them along successful solution paths. Implementing this approach is particularly challenging for programming domains, since the only directly observable student actions are not easily interpretable. We present two novel approaches to creating a domain model for programming exercises in a data-driven fashion. The first approach models programming as a sequence of textual rewrites, and learns rewrite rules for transforming programs. With these rules new student-submitted programs can be automatically debugged. The second approach uses structural patterns in programs’ abstract syntax trees to learn rules for classifying submissions as correct or incorrect. These rules can be used to find erroneous parts of an incorrect program. Both models support automatic hint generation. We have implemented an online application for learning programming and used it to evaluate both approaches. Results indicate that hints generated using either approach have a positive effect on student performance
    corecore