708 research outputs found
FixMiner: Mining Relevant Fix Patterns for Automated Program Repair
Patching is a common activity in software development. It is generally
performed on a source code base to address bugs or add new functionalities. In
this context, given the recurrence of bugs across projects, the associated
similar patches can be leveraged to extract generic fix actions. While the
literature includes various approaches leveraging similarity among patches to
guide program repair, these approaches often do not yield fix patterns that are
tractable and reusable as actionable input to APR systems. In this paper, we
propose a systematic and automated approach to mining relevant and actionable
fix patterns based on an iterative clustering strategy applied to atomic
changes within patches. The goal of FixMiner is thus to infer separate and
reusable fix patterns that can be leveraged in other patch generation systems.
Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree
structure of the edit scripts that captures the AST-level context of the code
changes. FixMiner uses different tree representations of Rich Edit Scripts for
each round of clustering to identify similar changes. These are abstract syntax
trees, edit actions trees, and code context trees. We have evaluated FixMiner
on thousands of software patches collected from open source projects.
Preliminary results show that we are able to mine accurate patterns,
efficiently exploiting change information in Rich Edit Scripts. We further
integrated the mined patterns to an automated program repair prototype,
PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J
benchmark. Beyond this quantitative performance, we show that the mined fix
patterns are sufficiently relevant to produce patches with a high probability
of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure
Mining Fix Patterns for FindBugs Violations
In this paper, we first collect and track a large number of fixed and unfixed
violations across revisions of software.
The empirical analyses reveal that there are discrepancies in the
distributions of violations that are detected and those that are fixed, in
terms of occurrences, spread and categories, which can provide insights into
prioritizing violations.
To automatically identify patterns in violations and their fixes, we propose
an approach that utilizes convolutional neural networks to learn features and
clustering to regroup similar instances. We then evaluate the usefulness of the
identified fix patterns by applying them to unfixed violations.
The results show that developers will accept and merge a majority (69/116) of
fixes generated from the inferred fix patterns. It is also noteworthy that the
yielded patterns are applicable to four real bugs in the Defects4J major
benchmark for software testing and automated repair.Comment: Accepted for IEEE Transactions on Software Engineerin
A Survey of Learning-based Automated Program Repair
Automated program repair (APR) aims to fix software bugs automatically and
plays a crucial role in software development and maintenance. With the recent
advances in deep learning (DL), an increasing number of APR techniques have
been proposed to leverage neural networks to learn bug-fixing patterns from
massive open-source code repositories. Such learning-based techniques usually
treat APR as a neural machine translation (NMT) task, where buggy code snippets
(i.e., source language) are translated into fixed code snippets (i.e., target
language) automatically. Benefiting from the powerful capability of DL to learn
hidden relationships from previous bug-fixing datasets, learning-based APR
techniques have achieved remarkable performance. In this paper, we provide a
systematic survey to summarize the current state-of-the-art research in the
learning-based APR community. We illustrate the general workflow of
learning-based APR techniques and detail the crucial components, including
fault localization, patch generation, patch ranking, patch validation, and
patch correctness phases. We then discuss the widely-adopted datasets and
evaluation metrics and outline existing empirical studies. We discuss several
critical aspects of learning-based APR techniques, such as repair domains,
industrial deployment, and the open science issue. We highlight several
practical guidelines on applying DL techniques for future APR studies, such as
exploring explainable patch generation and utilizing code features. Overall,
our paper can help researchers gain a comprehensive understanding about the
achievements of the existing learning-based APR techniques and promote the
practical application of these techniques. Our artifacts are publicly available
at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}
Recommended from our members
Spreadsheet Tools for Data Analysts
Spreadsheets are a natural fit for data analysis, combining a simple data storage and presentation layer with a programming language and basic debugging tools. Because spreadsheets are accessible and flexible, they are used by both novices and experts. Consequently, spreadsheets are hugely popular, with more than 750 million copies of Microsoft Excel installed worldwide. This popularity means that spreadsheets are the most popular programming language on the planet and the de facto tool for data analysis.
Nevertheless, spreadsheets do not address a number of important tasks in a typical analyst\u27s pipeline, and their design frequently complicates them. This thesis describes three key challenges for analysts using spreadsheets. 1) Data wrangling is the process of converting or mapping data from a raw form into another form suitable for use with automated tools. 2) Data cleaning is the process of locating and correcting omitted or erroneous data. 3) Formula auditing is the process of finding and correcting spreadsheet program errors. These three tasks combined are estimated to occupy more than three quarters of a data analyst\u27s time. Furthermore, errors not caught during these steps have led to catastrophically bad decisions resulting in billions of dollars in losses. Advances in automated techniques for these tasks may result in dramatic savings in both time and money.
Three novel programming language-based techniques were created to address these key tasks. The first, automatic layout transformation using examples, is a program synthesis-based technique that lets spreadsheet users perform data wrangling tasks automatically, at scale, and without programming. The second, data debugging, is technique for data cleaning that combines program analysis and statistical analysis to automatically find likely data errors. The third, spatio-structural program analysis unifies positional and dependence information and finds spreadsheet errors using a kind of anomaly analysis.
Each technique was implemented as an end-user tool---FlaskRelate, CheckCell, and ExceLint respectively---in the form of a point-and-click plugin for Microsoft Excel. Our evaluation demonstrates that these techniques substantially improve user efficiency. Finally, because these tools build on each other in a complementary fashion, data analysts can run data wrangling, cleaning, and formula auditing tasks together in a single analysis pipeline
Identifying evidences of computer programming skills through automatic source code evaluation
Orientador: Roberto PereiraCoorientador: Eleandro MaschioTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 27/03/2020Inclui referências: p. 98-106Área de concentração: Ciência da ComputaçãoResumo: Esta tese e contextualizada no ensino de programacao de computadores em cursos de Computacao e investiga aspectos e estrategias para avaliacao automatica e continua de codigos fonte desenvolvidos pelos alunos. O estado da arte foi identificado por meio de revisao sistematica de literatura e revelou que as pesquisas anteriores tendem a realizar avaliacoes baseadas em aspectos tecnicos de codigos fonte, como a avaliacao de corretude funcional e a deteccao de erros. Avaliacoes baseadas em habilidades, por outro lado, sao pouco exploradas e possuem potencial para fornecer detalhes a respeito de habilidades representadas por conceitos de alto nivel, como desvios condicionais e estruturas de repeticao. Um metodo de identificacao automatica de evidencias de aprendizado e entao proposto como uma abordagem baseada em habilidades para a avaliacao automatica de codigos fonte de programacao. O metodo e caracterizado pela implementacao de diferentes estrategias para avaliacao de codigos fonte, identificacao de evidencias de habilidades de programacao, e representacao destas habilidades em um modelo do aluno. Experimentos realizados em ambientes controlados (bases de dados artificiais) mostraram que estrategias automaticas de avaliacao de codigo fonte sao viaveis. Experimentos conduzidos em ambientes reais (codigos fonte produzidos por alunos) produziram resultados semelhantes aos ambientes controlados, entretanto revelaram limitacoes relacionadas a implementacao das estrategias, como vulnerabilidades a sintaxes inesperadas e falhas em expressoes regulares. Um conjunto de habilidades foi selecionado para compor o modelo do aluno, representado por uma rede bayesiana dinamica. Por meio de experimentos foi demonstrado que a alimentacao do modelo com evidencias resultantes da avaliacao automatica de codigos fonte permite o acompanhamento do progresso das habilidades dos alunos. Finalmente, as estrategias automaticas em conjunto com os recursos do modelo do aluno permitiram a demonstracao da avaliacao baseada em habilidades, que se mostrou um recurso valioso para identificacao de solucoes funcionalmente corretas, porem conceitualmente incorretas; quando o programa e funcionalmente correto, retornando resultados esperados a determinadas entradas, porem foi construido com recursos e conceitos incorretos. Palavras-chave: Programacao de Computadores, Avaliacao Automatica, Avaliacao Baseada em HabilidadesAbstract: This thesis is contextualized in the teaching of computer programming in Computing courses and investigates aspects and strategies for automatic and continuous evaluation of student developed source codes. The state of the art was identified through systematic literature review and revealed previous research tends to perform evaluations based on source codes technical aspects, such as functional correctness assessment and error detection. Skills-based assessments, in turn, are less explored although having potential to provide details of skills represented by high-level concepts, such as conditionals and repetition structures. A method for automatic identification of learning evidences is then proposed as a skills-based approach to automatic evaluation of programming source codes. The method is characterized by implementing different strategies for source code evaluation, identifying evidences of programming skills, and representing these skills in a student model. Experiments conducted in controlled scenarios (testing datasets) have shown automatic source code evaluation strategies are viable. Experiments conducted in real scenarios (student-made source codes) produced results similar to controlled scenarios, however, implementation-related limitations were revealed for some strategies, such as vulnerabilities to unexpected syntax and flaws in regular expressions. A skill set was selected to compose our student model, represented by a Dynamic Bayesian Network. Experiments have shown feeding the model with evidences resulting from source codes automatic evaluation allows monitoring students' skills progress. Finally, automatic strategies coupled with student model capabilities enabled demonstrating skills-based assessment, which showed a valuable resource for identifying functionally correct source codes, but conceptually incorrect; when a program is correct functionally, returning expected results to specific inputs, but it was built with erroneous concepts and resources. Keywords: Computer Programming, Automatic Evaluation, Skills-Based Assessmen
Hint generation in programming tutors
Programming is increasingly recognized as a useful and important skill. Online programming
courses that have appeared in the past decade have proven extremely popular with a wide audience. Learning in such courses is however not as effective as working directly with a teacher, who can provide students with immediate relevant feedback.
The field of intelligent tutoring systems seeks to provide such feedback automatically. Traditionally, tutors have depended on a domain model defined by the teacher in advance. Creating such a model is a difficult task that requires a lot of knowledgeengineering effort, especially in complex domains such as programming.
A potential solution to this problem is to use data-driven methods. The idea is to build the domain model by observing how students have solved an exercise in the past. New students can then be given feedback that directs them along successful solution paths. Implementing this approach is particularly challenging for programming domains, since the only directly observable student actions are not easily interpretable.
We present two novel approaches to creating a domain model for programming exercises
in a data-driven fashion. The first approach models programming as a sequence of textual rewrites, and learns rewrite rules for transforming programs. With these rules new student-submitted programs can be automatically debugged. The second approach uses structural patterns in programs’ abstract syntax trees to learn rules for classifying submissions as correct or incorrect. These rules can be used to find erroneous parts of an incorrect program. Both models support automatic hint generation.
We have implemented an online application for learning programming and used it to evaluate both approaches. Results indicate that hints generated using either approach
have a positive effect on student performance
- …