15 research outputs found

    Recovery of Software Architecture from Code Repositories

    Get PDF
    The goal of this work is to create an approach and tool that will a) extract architectural-significant information from code repositories, namely from resources such as dockerfiles and terraform configurations; b) use of the extracted information to synthesise architectural models that will be kept in-sync with the code repositories automatically; c) support the mechanisms that will allow a team to supply any additional details to the architectural model that can't be inferred directly from the repositories. This approach is expected to reduce information redundancy between code and textual documentation, and still allow an integrated and machine-readable view of the overall software architecture of a system.Architecture can be the result of multiple intangibly connected parts spread across source code and other development artifacts. This makes it difficult to describe the architecture without resourcing to auxiliary documentation that puts this information together. Most of the times, this documentation is manually created, render it a costly process which overtime starts to be desregarded and the documentation becomes out of date and sometimes obsolete. Automating the recovery of the architecture using artifacts that are already present on the source code could potentially improve the way documentation is updated and used

    Learning Code Transformations via Neural Machine Translation

    Get PDF
    Source code evolves – inevitably – to remain useful, secure, correct, readable, and efficient. Developers perform software evolution and maintenance activities by transforming existing source code via corrective, adaptive, perfective, and preventive changes. These code changes are usually managed and stored by a variety of tools and infrastructures such as version control, issue trackers, and code review systems. Software Evolution and Maintenance researchers have been mining these code archives in order to distill useful insights on the nature of such developers’ activities. One of the long-lasting goal of Software Engineering research is to better support and automate different types of code changes performed by developers. In this thesis we depart from classic manually crafted rule- or heuristic-based approaches, and propose a novel technique to learn code transformations by leveraging the vast amount of publicly available code changes performed by developers. We rely on Deep Learning, and in particular on Neural Machine Translation (NMT), to train models able to learn code change patterns and apply them to novel, unseen, source code. First, we tackle the problem of generating source code mutants for Mutation Testing. In contrast with classic approaches, which rely on handcrafted mutation operators, we propose to automatically learn how to mutate source code by observing real faults. We mine millions of bug fixing commits from GitHub, process and abstract their source code. This data is used to train and evaluate an NMT model to translate fixed code into buggy code (i.e., the mutated code). In the second project, we rely on the same dataset of bug-fixes to learn code transformations for the purpose of Automated Program Repair (APR). This represents one of the most challenging research problem in Software Engineering, whose goal is to automatically fix bugs without developers’ intervention. We train a model to translate buggy code into fixed code (i.e., learning patches) and, in conjunction with Beam Search, generate many different potential patches for a given buggy method. In our empirical investigation we found that such a model is able to fix thousands of unique buggy methods in the wild.Finally, in our third project we push our novel technique to the limits and enlarge the scope to consider not only bug-fixing activities, but any type of meaningful code changes performed by developers. We focus on accepted and merged code changes that undergone a Pull Request (PR) process. We quantitatively and qualitatively investigate the code transformations learned by the model to build a taxonomy. The taxonomy shows that NMT can replicate a wide variety of meaningful code changes, especially refactorings and bug-fixing activities. In this dissertation we illustrate and evaluate the proposed techniques, which represent a significant departure from earlier approaches in the literature. The promising results corroborate the potential applicability of learning techniques, such as NMT, to a variety of Software Engineering tasks

    Learning syntactic program transformations from examples.

    Get PDF
    Ferramentas como ErrorProne, ReSharper e PMD ajudam os programadores a detectar e/ou remover automaticamente vários padrões de códigos suspeitos, possíveis bugs ou estilo de código incorreto. Essas regras podem ser expressas como quick fixes que detectam e reescrevem padrões de código indesejados. No entanto, estender seus catálogos de regras é complexo e demorado. Nesse contexto, os programadores podem querer executar uma edição repetitiva automaticamente para melhorar sua produtividade, mas as ferramentas disponíveis não a suportam. Além disso, os projetistas de ferramentas podem querer identificar regras úteis para automatizarem. Fenômeno semelhante ocorre em sistemas de tutoria inteligente, onde os instrutores escrevem transformações complicadas que descrevem "falhas comuns" para consertar submissões semelhantes de estudantes a tarefas de programação. Nesta tese, apresentamos duas técnicas. REFAZER, uma técnica para gerar automaticamente transformações de programa. Também propomos REVISAR, nossa técnica para aprender quick fixes em repositórios. Nós instanciamos e avaliamos REFAZER em dois domínios. Primeiro, dados exemplos de edições de código dos alunos para corrigir submissões de tarefas incorretas, aprendemos transformações para corrigir envios de outros alunos com falhas semelhantes. Em nossa avaliação em quatro tarefas de programação de setecentos e vinte alunos, nossa técnica ajudou a corrigir submissões incorretas para 87% dos alunos. No segundo domínio, usamos edições de código repetitivas aplicadas por desenvolvedores ao mesmo projeto para sintetizar a transformação de programa que aplica essas edições a outros locais no código. Em nossa avaliação em 56 cenários de edições repetitivas de três grandes projetos de código aberto em C#, REFAZER aprendeu a transformação pretendida em 84% dos casos e usou apenas 2.9 exemplos em média. Para avaliar REVISAR, selecionamos 9 projetos e REVISAR aprendeu 920 transformações entre projetos. Atuamos como projetistas de ferramentas, inspecionamos as 381 transformações mais comuns e classificamos 32 como quick fixes. Para avaliar a qualidade das quick fixes, realizamos uma survey com 164 programadores de 124 projetos, com os 10 quick fixes que apareceram em mais projetos. Os programadores suportaram 9 (90%) quick fixes. Enviamos 20 pull requests aplicando quick fixes em 9 projetos e, no momento da escrita, os programadores apoiaram 17 (85%) e aceitaram 10 delas.Tools such as ErrorProne, ReSharper, and PMD help programmers by automatically detecting and/or removing several suspicious code patterns, potential bugs, or instances of bad code style. These rules could be expressed as quick fixes that detect and rewrite unwanted code patterns. However, extending their catalogs of rules is complex and time-consuming. In this context, programmers may want to perform a repetitive edit into their code automatically to improve their productivity, but available tools do not support it. In addition, tool designers may want to identify rules helpful to be automated. A similar phenomenon appears in intelligent tutoring systems where instructors have to write cumbersome code transformations that describe “common faults” to fix similar student submissions to programming assignments. In this thesis, we present two techniques. REFAZER, a technique for automatically generating program transformations. We also propose REVISAR, our technique for learning quick fixes from code repositories. We instantiate and evaluate REFAZER in two domains. First, given examples of code edits used by students to fix incorrect programming assignment submissions, we learn program transformations that can fix other students’ submissions with similar faults. In our evaluation conducted on four programming tasks performed by seven hundred and twenty students, our technique helped to fix incorrect submissions for 87% of the students. In the second domain, we use repetitive code edits applied by developers to the same project to synthesize a program transformation that applies these edits to other locations in the code. In our evaluation conducted on 56 scenarios of repetitive edits taken from three large C# open-source projects, REFAZER learns the intended program transformation in 84% of the cases and using only 2.9 examples on average. To evaluate REVISAR, we select 9 projects, and REVISAR learns 920 transformations across projects. We acted as tool designers, inspected the most common 381 transformations and classified 32 as quick fixes. To assess the quality of the quick fixes, we performed a survey with 164 programmers from 124 projects, showing the 10 quick fixes that appeared in most projects. Programmers supported 9 (90%) quick fixes. We submitted 20 pull requests applying our quick fixes to 9 projects and, at the time of the writing, programmers supported 17 (85%) and accepted 10 of them.Cape

    Quantifying, Characterizing, and Leveraging Cross-Disciplinary Dependencies: Empirical Studies from a Video Game Development Setting

    Get PDF
    Continuous Integration (CI) is a common practice adopted by modern software organizations. It plays an especially important role for large corporations like Ubisoft, where thousands of build jobs are submitted daily. The CI process of video games, which are developed by studios like Ubisoft, involves assembling artifacts that are produced by personnel with various types of expertise, such as source code produced by developers, graphics produced by artists, and audio produced by musicians and sound experts. To weave these artifacts into a cohesive system, the build system—a key component in CI—processes each artifacts while respecting their intra- and inter-artifact dependencies. In such projects, a change produced by any team can impact artifacts from other teams, and may cause defects if the transitive impact of changes is not carefully considered. Therefore, to better understand the potential challenges and opportunities presented by multidisciplinary software projects, we conduct an empirical study of a recently launched video game project, which reveals that code files only make up 2.8% of the nodes in the build dependency graph, and code-to-code dependencies only make up 4.3% of all dependencies. We also observe that the impact of 44% of the studied source code changes crosses disciplinary boundaries, highlighting the importance of analyzing inter-artifact dependencies. A comparative analysis of cross-boundary changes with changes that do not cross boundaries indicates that cross-boundary changes are: (1) impacting a median of 120,368 files; (2) with a 51% probability of causing build failures; and (3) a 67% likelihood of introducing defects. All three measurements are larger than changes that do not cross boundaries to statistically significant degrees. We also find that cross-boundary changes are: (4) more commonly associated with gameplay functionality and feature additions that directly impact the game experience than changes that do not cross boundaries, and (5) disproportionately produced by a single team (74% of the contributors of cross-boundary changes are associated with that team). Next, we set out to explore whether analysis of cross-boundary changes can be leveraged to accelerate CI. Indeed, the cadence of development progress is constrained by the pace at which CI services process build jobs. To provide faster CI feedback, recent work explores how build outcomes can be anticipated. Although early results show plenty of promise, prior work on build outcome prediction has largely focused on open-source projects that are code-intensive, while the distinct characteristics of a AAA video game project at Ubisoft presents new challenges and opportunities for build outcome prediction. In the video game setting, changes that do not modify source code also incur build failures. Moreover, we find that the code changes that have an impact that crosses the source-data boundary are more prone to build failures than code changes that do not impact data files. Since such changes are not fully characterized by the existing set of build outcome prediction features, state-of-the-art models tend to underperform. Therefore, to accommodate the data context into build outcome prediction, we propose RavenBuild, a novel approach that leverages context, relevance, and dependency-aware features. We apply the state-of-the-art BuildFast model and RavenBuild to the video game project, and observe that RavenBuild improves the F1-score of the failing class by 46%, the recall of the failing class by 76%, and AUC by 28%. To ease adoption in settings with heterogeneous project sets, we also provide a simplified alternative RavenBuild-CR, which excludes dependency-aware features. We apply RavenBuild-CR on 22 open-source projects and the video game project, and observe across-the-board improvements as well. On the other hand, we find that a naive Parrot approach, which simply echoes the previous build outcome as its prediction, is surprisingly competitive with BuildFast and RavenBuild. Though Parrot fails to predict when the build outcome differs from their immediate predecessor, Parrot serves well as a tendency indicator of the sequences in build outcome datasets. Therefore, future studies should also consider comparing the Parrot approach as a baseline when evaluating build outcome prediction models

    Fundamental Approaches to Software Engineering

    Get PDF
    This open access book constitutes the proceedings of the 24th International Conference on Fundamental Approaches to Software Engineering, FASE 2021, which took place during March 27–April 1, 2021, and was held as part of the Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg but changed to an online format due to the COVID-19 pandemic. The 16 full papers presented in this volume were carefully reviewed and selected from 52 submissions. The book also contains 4 Test-Comp contributions

    Actor-critic reinforcement learning algorithms for yaw control of an Autonomous Underwater Vehicle

    Get PDF
    An Autonomous Underwater Vehicle (AUV) poses unique challenges that must be solved in order to achieve persistent autonomy. The requirement of persistent autonomy entails that a control solution must be capable of controlling a vehicle that is operating in an environment with complex non-linear dynamics and adapt to changes in those dynamics. In essence, artificial intelligence is required so that the vehicle can learn from its experience operating in the domain. In this thesis, reinforcement learning is the chosen machine learning mechanism. This learning paradigm is investigated by applying multiple actor-critic temporal difference learning algorithms to the yaw degree-of-freedom of a simulated model and the physical hardware of the Nessie VII AUV in a closed-loop feedback control problem. Additionally, results are also presented for path planning and path optimisation problems. These control problems are solved by modelling the AUV’s interaction with its environment as an optimal decision-making problem using a Markov Decision Process (MDP). Two novel actor-critic temporal difference learning algorithms called Linear True Online Continuous Learning Automation (Linear TOCLA) and Non-linear True Online Continuous Learning Automation (Non-linear TOCLA) are also presented and serve as new contributions to the reinforcement learning research community. These algorithms have been applied to the real Nessie vehicle and its simulated model. The proposed algorithms hold theoretical and practical advantages over previous state-of-the-art temporal difference learning algorithms. A new genetic algorithm is also presented and developed specifically for the optimisation of the continuous-valued reinforcement learning algorithms’. This genetic algorithm is used to find the optimal hyperparameters for four actor-critic algorithms in the well-known continuous-valued mountain car reinforcement learning benchmark problem. The results of this benchmark show that the Non-linear TOCLA algorithm achieves a similar performance to the state-of-the-art forward actor-critic algorithm it extends while significantly reducing the sensitivity of the hyperparameter selection. This reduction in hyperparameter sensitivity is shown using the distribution of optimal hyperparameters from ten separate optimisation runs. The actor learning rate of the forward actor-critic algorithm had a standard deviation of 0.00088, while the Non-linear TOCLA algorithm demonstrated a standard deviation of 0.00186. An even greater improvement is observed in the multi-step target weight, λ, which increased from a standard deviation of 0.036 for the forward actor-critic to 0.266 for the Non-linear TOCLA algorithm. All of the sourcecode used to generate the results in this thesis has been made available as open-source software.ARchaeological RObot systems for the Worlds Seas (ARROWS) EU FP7 project under grant agreement ID 30872

    Changing Software Development Practice: A Case Study of DevOps Adoption

    Get PDF
    DevOps, a portmanteau of development and operations, is a Software Engineering approach to emerge in industry, with a goal to rapidly develop and deploy good quality software. It has seen increased research attention in recent years with most studies focusing exclusively on tools used for DevOps or attempts to universally define it. This has led to a misunderstanding of DevOps alongside differing definitions, and therefore this research argues that a universal definition should not be sought. A focus group of practitioners evaluated existing definitions with the findings further tested in a questionnaire to the wider DevOps community. The output of this informed a 14 month case study of DevOps adoption in a medium sized UK organisation. A pragmatic approach was taken to study what DevOps meant for the organisation and its impact on employees and other business functions. This research contributes to theory by identifying the core attributes of DevOps, and by using a job crafting theoretical lens to understand the organisational change required to implement DevOps and elucidating how individuals change their work identity as they adopt DevOps practices and processes. In particular, this research finds that Software Developers are natural Job Crafters, especially if afforded the freedom to do so. This research contributes methodologically by using multiple methods, and in particular a longitudinal qualitative diary study over 14 months with a very low attrition rate. This was achieved through using tools that participants use in their work to record their experiences of DevOps implementation. Finally, this research makes a practical contribution by developing the building blocks of attributes that organisations should consider within their specific context and by developing an interdisciplinary framework that takes account of both the software development process and the associated management implications of adopting and implementing DevOps

    Digital Methods and Technicity-of-the-Mediums. From Regimes of Functioning to Digital Research

    Get PDF
    Digital methods are taken here as a research practice crucially situated in the technological environment that it explores and exploits. Through software-oriented analysis, this research practice proposes to re-purpose online methods and data for social-medium research but not considered as a proper type of fieldwork because these methods are new and still in their process of description. These methods impose proximity with software and reflect an environment inhabited by technicity. Thus, this dissertation is concerned with a key element of the digital methods research approach: the computational (or technical) mediums as carriers of meaning (see Berry, 2011; Rieder, 2020). The central idea of this dissertation is to address the role of technical knowledge, practise and expertise (as problems and solutions) in the full range of digital methods, taking the technicity of the computational mediums and digital records as objects of study. By focusing on how the concept of technicity matters in digital research, I argue that not only do digital methods open an opportunity for further enquiry into this concept, but they also benefit from such enquiry, since the working material of this research practice are the media, its methods, mechanisms and data. In this way, the notion of technicity-of-the-mediums is used in two senses pointing on the one hand to the effort to become acquainted with the mediums (from a conceptual, technical and empirical perspective), on the other hand, to the object of technical imagination (the capacity of considering the features and practical qualities of technical mediums as ensemble and as a solution to methodological problems). From the standpoint of non-developer researchers and the perspective of software practice, the understanding of digital technologies starts from direct contact, comprehension and different uses of (research) software and the web environment. The journey of digital methods is only fulfilled by technical practice, experimentation and exploration. Two main arguments are put forward in this dissertation. The first states that we can only repurpose what we know well, which means that we need to become acquainted with the mediums from a conceptual-technical-practical perspective; whereas, the second argument states that the practice of digital methods is enhanced when researchers make room for, grow and establish a sensitivity to the technicity-of-the-mediums. The main contribution of this dissertation is to develop a series of conceptual and practical principles for digital research. Theoretically, this dissertation suggests a broader definition of medium in digital methods and introduces the notion of the technicity-of-the-mediums and three distinct but related aspects to consider – namely platform grammatisation, cultures of use and software affordances, as an attempt to defuse some of the difficulties related to the use of digital methods. Practically, it presents concrete methodological approaches providing new analytical perspectives for social media research and digital network studies, while suggesting a way of carrying out digital fieldwork which is substantiated by technical practices and imagination.Os métodos digitais são aqui tomados como uma prática de investigação crucialmente situada no ambiente tecnológico que explora e do qual tira benefício. Esta prática de pesquisa propõe a reorientação dos métodos online e dos dados para a pesquisa social e do meio através da análise orientada por software, prática ainda não considerada como um tipo adequado de trabalho de campo porque estes métodos são novos e a sua descrição está ainda numa fase incipiente. Estes métodos obrigam a adquirir familiaridade com o software e refletem um ambiente habitado pela tecnicidade. Esta dissertação diz assim respeito a um elemento-chave da abordagem de investigação dos métodos digitais: os meios computacionais (ou técnicos) enquanto portadores de significado (ver Berry, 2011; Rieder, 2020). A ideia central desta dissertação é a de refletir sobre o papel do conhecimento técnico, da prática técnica e da aquisição de competências (como problemas e como soluções) em todo o âmbito dos métodos digitais, assumindo a tecnicidade dos meios computacionais e dos registos digitais como objetos de estudo. Ao centrar-me na forma como o conceito de tecnicidade é fundamental na investigação digital, argumento que não só os métodos digitais abrem uma oportunidade para uma investigação mais aprofundada deste conceito, mas também que beneficiam deste tipo de investigação, uma vez que a matéria-prima desta prática de pesquisa são os meios, os seus métodos, mecanismos e dados. Deste modo, a noção de tecnicidade-dos-meios é utilizada em dois sentidos: apontando, por um lado, para a necessidade de conhecimento dos meios (duma perspetiva conceptual, técnica e empírica) e, por outro, para o objeto da imaginação técnica (a capacidade de tomar as características e as qualidades práticas dos meios computacionais como um conjunto [ensemble] e como uma solução para problemas metodológicos). Segundo o ponto de vista dos pesquisadores que não estão familiarizados com o desenvolvimento de software (ou de ferramentas digitais) bem como da perspectiva da prática do software, a compreensão das tecnologias digitais deve partir de um contato direto, da compreensão e dos diferentes usos do software e do ambiente da web. O percurso dos métodos digitais só pode ser concretizado pela prática técnica, pela experimentação e pela exploração. Dois argumentos principais são apresentados nesta dissertação. O primeiro afirma que só podemos tirar proveito daquilo que conhecemos de forma aprofundada, o que significa que é necessário que nos familiarizemos com os meios numa perspetiva conceptual-técnica-prática, enquanto o segundo argumento afirma que a prática dos métodos digitais é aperfeiçoada quando os investigadores estão recetivos a, amadurecem e adquirem uma sensibilidade para a tecnicidade-dos-meios. A principal contribuição desta dissertação é o desenvolvimento de um conjunto de princípios conceptuais e práticos para a pesquisa digital. Teoricamente, esta dissertação propõe uma definição mais ampla de meio nos métodos digitais, introduz o conceito de tecnicidade dos- meios e aponta para três facetas distintas mas relacionadas – referimo-nos à gramatização das plataformas, às culturas de utilização e às affordances do software –, como uma solução para minorar algumas das dificuldades relacionadas com a utilização dos métodos digitais. Na prática, apresenta abordagens metodológicas concretas que fornecem novas perspetivas analíticas para a investigação dos media sociais e para os estudos de redes digitais, ao mesmo tempo que sugere uma forma de levar a cabo trabalho de campo digital que é substanciada por práticas técnicas e pela imaginação técnica
    corecore