3,490 research outputs found

    Software Engineers' Information Seeking Behavior in Change Impact Analysis - An Interview Study

    Get PDF
    Software engineers working in large projects must navigate complex information landscapes. Change Impact Analysis (CIA) is a task that relies on engineers' successful information seeking in databases storing, e.g., source code, requirements, design descriptions, and test case specifications. Several previous approaches to support information seeking are task-specific, thus understanding engineers' seeking behavior in specific tasks is fundamental. We present an industrial case study on how engineers seek information in CIA, with a particular focus on traceability and development artifacts that are not source code. We show that engineers have different information seeking behavior, and that some do not consider traceability particularly useful when conducting CIA. Furthermore, we observe a tendency for engineers to prefer less rigid types of support rather than formal approaches, i.e., engineers value support that allows flexibility in how to practically conduct CIA. Finally, due to diverse information seeking behavior, we argue that future CIA support should embrace individual preferences to identify change impact by empowering several seeking alternatives, including searching, browsing, and tracing.Comment: Accepted for publication in the proceedings of the 25th International Conference on Program Comprehensio

    Deep Learning Software Repositories

    Get PDF
    Bridging the abstraction gap between artifacts and concepts is the essence of software engineering (SE) research problems. SE researchers regularly use machine learning to bridge this gap, but there are three fundamental issues with traditional applications of machine learning in SE research. Traditional applications are too reliant on labeled data. They are too reliant on human intuition, and they are not capable of learning expressive yet efficient internal representations. Ultimately, SE research needs approaches that can automatically learn representations of massive, heterogeneous, datasets in situ, apply the learned features to a particular task and possibly transfer knowledge from task to task. Improvements in both computational power and the amount of memory in modern computer architectures have enabled new approaches to canonical machine learning tasks. Specifically, these architectural advances have enabled machines that are capable of learning deep, compositional representations of massive data depots. The rise of deep learning has ushered in tremendous advances in several fields. Given the complexity of software repositories, we presume deep learning has the potential to usher in new analytical frameworks and methodologies for SE research and the practical applications it reaches. This dissertation examines and enables deep learning algorithms in different SE contexts. We demonstrate that deep learners significantly outperform state-of-the-practice software language models at code suggestion on a Java corpus. Further, these deep learners for code suggestion automatically learn how to represent lexical elements. We use these representations to transmute source code into structures for detecting similar code fragments at different levels of granularity—without declaring features for how the source code is to be represented. Then we use our learning-based framework for encoding fragments to intelligently select and adapt statements in a codebase for automated program repair. In our work on code suggestion, code clone detection, and automated program repair, everything for representing lexical elements and code fragments is mined from the source code repository. Indeed, our work aims to move SE research from the art of feature engineering to the science of automated discovery

    PhenDisco: phenotype discovery system for the database of genotypes and phenotypes.

    Get PDF
    The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net

    Software Development Analytics in Practice: A Systematic Literature Review

    Full text link
    Context:Software Development Analytics is a research area concerned with providing insights to improve product deliveries and processes. Many types of studies, data sources and mining methods have been used for that purpose. Objective:This systematic literature review aims at providing an aggregate view of the relevant studies on Software Development Analytics in the past decade (2010-2019), with an emphasis on its application in practical settings. Method:Definition and execution of a search string upon several digital libraries, followed by a quality assessment criteria to identify the most relevant papers. On those, we extracted a set of characteristics (study type, data source, study perspective, development life-cycle activities covered, stakeholders, mining methods, and analytics scope) and classified their impact against a taxonomy. Results:Source code repositories, experimental case studies, and developers are the most common data sources, study types, and stakeholders, respectively. Product and project managers are also often present, but less than expected. Mining methods are evolving rapidly and that is reflected in the long list identified. Descriptive statistics are the most usual method followed by correlation analysis. Being software development an important process in every organization, it was unexpected to find that process mining was present in only one study. Most contributions to the software development life cycle were given in the quality dimension. Time management and costs control were lightly debated. The analysis of security aspects suggests it is an increasing topic of concern for practitioners. Risk management contributions are scarce. Conclusions:There is a wide improvement margin for software development analytics in practice. For instance, mining and analyzing the activities performed by software developers in their actual workbench, the IDE

    Can Refactoring be Self-Affirmed? An Exploratory Study on How Developers Document their Refactoring Activities in Commit Messages

    Get PDF
    Refactoring is a critical task in software maintenance and is usually performed to enforce best design practices, or to cope with design defects. Previous studies heavily rely on defining a set of keywords to identify refactoring commits from a list of general commits extracted from a small set of softwaresystems. All approaches thus far consider all commits without checking whether refactorings had actually happened or not. In this paper, we aim at exploring how developers document their refactoring activities during the software life cycle. We call such activity Self-Affirmed Refactoring, which is an indication ofthe developer-related refactoring events in the commit messages. Our approach relies on text mining refactoring-related change messages and identifying refactoring patterns by only consideringrefactoring commits. We found that (1) developers use a variety of patterns to purposefully target refactoring-related activities; (2) developers tend to explicitly mention the improvement of specific quality attributes and code smells; and (3) commit messages withself-affirmed refactoring patterns tend to have more significant refactoring activit

    A Changing Landscape:On Safety & Open Source in Automated and Connected Driving

    Get PDF

    Towards using fluctuations in internal quality metrics to find design intents

    Full text link
    Le contrôle de version est la pierre angulaire des processus de développement de logiciels modernes. Tout en construisant des logiciels de plus en plus complexes, les développeurs doivent comprendre des sous-systèmes de code source qui leur sont peu familier. Alors que la compréhension de la logique d'un code étranger est relativement simple, la compréhension de sa conception et de sa genèse est plus compliquée. Elle n'est souvent possible que par les descriptions des révisions et de la documentation du projet qui sont dispersées et peu fiables -- quand elles existent. Ainsi, les développeurs ont besoin d'une base de référence fiable et pertinente pour comprendre l'historique des projets logiciels. Dans cette thèse, nous faisons les premiers pas vers la compréhension des motifs de changement dans les historiques de révision. Nous étudions les changements prenant place dans les métriques logicielles durant l'évolution d'un projet. Au travers de multiples études exploratoires, nous réalisons des expériences quantitatives et qualitatives sur plusieurs jeux de données extraits à partir d'un ensemble de 13 projets. Nous extrayons les changements dans les métriques logicielles de chaque commit et construisons un jeu de donnée annoté manuellement comme vérité de base. Nous avons identifié plusieurs catégories en analysant ces changements. Un motif en particulier nommé "compromis", dans lequel certaines métriques peuvent s'améliorer au détriment d'autres, s'est avéré être un indicateur prometteur de changements liés à la conception -- dans certains cas, il laisse également entrevoir une intention de conception consciente de la part des auteurs des changements. Pour démontrer les observations de nos études exploratoires, nous construisons un modèle général pour identifier l'application d'un ensemble bien connu de principes de conception dans de nouveaux projets. Nos résultats suggèrent que les fluctuations de métriques ont le potentiel d'être des indicateurs pertinents pour gagner des aperçus macroscopiques sur l'évolution de la conception dans l'historique de développement d'un projet.Version control is the backbone of the modern software development workflow. While building more and more complex systems, developers have to understand unfamiliar subsystems of source code. Understanding the logic of unfamiliar code is relatively straightforward. However, understanding its design and its genesis is often only possible through scattered and unreliable commit messages and project documentation -- when they exist. Thus, developers need a reliable and relevant baseline to understand the history of software projects. In this thesis, we take the first steps towards understanding change patterns in commit histories. We study the changes in software metrics through the evolution of projects. Through multiple exploratory studies, we conduct quantitative and qualitative experiments on several datasets extracted from a pool of 13 projects. We mine the changes in software metrics for each commit of the respective projects and manually build oracles to represent ground truth. We identified several categories by analyzing these changes. One pattern, in particular, dubbed "tradeoffs", where some metrics may improve at the expense of others, proved to be a promising indicator of design-related changes -- in some cases, also hinting at a conscious design intent from the authors of the changes. Demonstrating the findings of our exploratory studies, we build a general model to identify the application of a well-known set of design principles in new projects. Our overall results suggest that metric fluctuations have the potential to be relevant indicators for valuable macroscopic insights about the design evolution in a project's development history
    corecore