3,490 research outputs found
Software Engineers' Information Seeking Behavior in Change Impact Analysis - An Interview Study
Software engineers working in large projects must navigate complex
information landscapes. Change Impact Analysis (CIA) is a task that relies on
engineers' successful information seeking in databases storing, e.g., source
code, requirements, design descriptions, and test case specifications. Several
previous approaches to support information seeking are task-specific, thus
understanding engineers' seeking behavior in specific tasks is fundamental. We
present an industrial case study on how engineers seek information in CIA, with
a particular focus on traceability and development artifacts that are not
source code. We show that engineers have different information seeking
behavior, and that some do not consider traceability particularly useful when
conducting CIA. Furthermore, we observe a tendency for engineers to prefer less
rigid types of support rather than formal approaches, i.e., engineers value
support that allows flexibility in how to practically conduct CIA. Finally, due
to diverse information seeking behavior, we argue that future CIA support
should embrace individual preferences to identify change impact by empowering
several seeking alternatives, including searching, browsing, and tracing.Comment: Accepted for publication in the proceedings of the 25th International
Conference on Program Comprehensio
Deep Learning Software Repositories
Bridging the abstraction gap between artifacts and concepts is the essence of software engineering (SE) research problems. SE researchers regularly use machine learning to bridge this gap, but there are three fundamental issues with traditional applications of machine learning in SE research. Traditional applications are too reliant on labeled data. They are too reliant on human intuition, and they are not capable of learning expressive yet efficient internal representations. Ultimately, SE research needs approaches that can automatically learn representations of massive, heterogeneous, datasets in situ, apply the learned features to a particular task and possibly transfer knowledge from task to task. Improvements in both computational power and the amount of memory in modern computer architectures have enabled new approaches to canonical machine learning tasks. Specifically, these architectural advances have enabled machines that are capable of learning deep, compositional representations of massive data depots. The rise of deep learning has ushered in tremendous advances in several fields. Given the complexity of software repositories, we presume deep learning has the potential to usher in new analytical frameworks and methodologies for SE research and the practical applications it reaches. This dissertation examines and enables deep learning algorithms in different SE contexts. We demonstrate that deep learners significantly outperform state-of-the-practice software language models at code suggestion on a Java corpus. Further, these deep learners for code suggestion automatically learn how to represent lexical elements. We use these representations to transmute source code into structures for detecting similar code fragments at different levels of granularity—without declaring features for how the source code is to be represented. Then we use our learning-based framework for encoding fragments to intelligently select and adapt statements in a codebase for automated program repair. In our work on code suggestion, code clone detection, and automated program repair, everything for representing lexical elements and code fragments is mined from the source code repository. Indeed, our work aims to move SE research from the art of feature engineering to the science of automated discovery
PhenDisco: phenotype discovery system for the database of genotypes and phenotypes.
The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net
Software Development Analytics in Practice: A Systematic Literature Review
Context:Software Development Analytics is a research area concerned with
providing insights to improve product deliveries and processes. Many types of
studies, data sources and mining methods have been used for that purpose.
Objective:This systematic literature review aims at providing an aggregate view
of the relevant studies on Software Development Analytics in the past decade
(2010-2019), with an emphasis on its application in practical settings.
Method:Definition and execution of a search string upon several digital
libraries, followed by a quality assessment criteria to identify the most
relevant papers. On those, we extracted a set of characteristics (study type,
data source, study perspective, development life-cycle activities covered,
stakeholders, mining methods, and analytics scope) and classified their impact
against a taxonomy. Results:Source code repositories, experimental case
studies, and developers are the most common data sources, study types, and
stakeholders, respectively. Product and project managers are also often
present, but less than expected. Mining methods are evolving rapidly and that
is reflected in the long list identified. Descriptive statistics are the most
usual method followed by correlation analysis. Being software development an
important process in every organization, it was unexpected to find that process
mining was present in only one study. Most contributions to the software
development life cycle were given in the quality dimension. Time management and
costs control were lightly debated. The analysis of security aspects suggests
it is an increasing topic of concern for practitioners. Risk management
contributions are scarce. Conclusions:There is a wide improvement margin for
software development analytics in practice. For instance, mining and analyzing
the activities performed by software developers in their actual workbench, the
IDE
Can Refactoring be Self-Affirmed? An Exploratory Study on How Developers Document their Refactoring Activities in Commit Messages
Refactoring is a critical task in software maintenance and is usually performed to enforce best design practices, or to cope with design defects. Previous studies heavily rely on defining a set of keywords to identify refactoring commits from a list of general commits extracted from a small set of softwaresystems. All approaches thus far consider all commits without checking whether refactorings had actually happened or not. In this paper, we aim at exploring how developers document their refactoring activities during the software life cycle. We call such activity Self-Affirmed Refactoring, which is an indication ofthe developer-related refactoring events in the commit messages. Our approach relies on text mining refactoring-related change messages and identifying refactoring patterns by only consideringrefactoring commits. We found that (1) developers use a variety of patterns to purposefully target refactoring-related activities; (2) developers tend to explicitly mention the improvement of specific quality attributes and code smells; and (3) commit messages withself-affirmed refactoring patterns tend to have more significant refactoring activit
Towards using fluctuations in internal quality metrics to find design intents
Le contrôle de version est la pierre angulaire des processus de développement de logiciels modernes. Tout en
construisant des logiciels de plus en plus complexes, les développeurs doivent comprendre des sous-systèmes de code
source qui leur sont peu familier. Alors que la compréhension de la logique d'un code étranger est relativement simple,
la compréhension de sa conception et de sa genèse est plus compliquée. Elle n'est souvent possible que par les
descriptions des révisions et de la documentation du projet qui sont dispersées et peu fiables -- quand elles existent.
Ainsi, les développeurs ont besoin d'une base de référence fiable et pertinente pour comprendre l'historique des projets
logiciels. Dans cette thèse, nous faisons les premiers pas vers la compréhension des motifs de changement dans les
historiques de révision. Nous étudions les changements prenant place dans les métriques logicielles durant l'évolution
d'un projet.
Au travers de multiples études exploratoires, nous réalisons des expériences quantitatives et qualitatives sur plusieurs
jeux de données extraits à partir d'un ensemble de 13 projets. Nous extrayons les changements dans les métriques
logicielles de chaque commit et construisons un jeu de donnée annoté manuellement comme vérité de base.
Nous avons identifié plusieurs catégories en analysant ces changements. Un motif en particulier nommé "compromis", dans
lequel certaines métriques peuvent s'améliorer au détriment d'autres, s'est avéré être un indicateur prometteur de
changements liés à la conception -- dans certains cas, il laisse également entrevoir une intention de conception
consciente de la part des auteurs des changements. Pour démontrer les observations de nos études exploratoires, nous
construisons un modèle général pour identifier l'application d'un ensemble bien connu de principes de conception dans de
nouveaux projets.
Nos résultats suggèrent que les fluctuations de métriques ont le potentiel d'être des indicateurs pertinents pour gagner
des aperçus macroscopiques sur l'évolution de la conception dans l'historique de développement d'un projet.Version control is the backbone of the modern software development workflow. While building more and more complex
systems, developers have to understand unfamiliar subsystems of source code. Understanding the logic of unfamiliar code
is relatively straightforward. However, understanding its design and its genesis is often only possible through
scattered and unreliable commit messages and project documentation -- when they exist.
Thus, developers need a reliable and relevant baseline to understand the history of software projects. In this thesis,
we take the first steps towards understanding change patterns in commit histories. We study the changes in software
metrics through the evolution of projects.
Through multiple exploratory studies, we conduct quantitative and qualitative experiments on several datasets extracted
from a pool of 13 projects. We mine the changes in software metrics for each commit of the respective projects and
manually build oracles to represent ground truth.
We identified several categories by analyzing these changes. One pattern, in particular, dubbed "tradeoffs", where some
metrics may improve at the expense of others, proved to be a promising indicator of design-related changes -- in some
cases, also hinting at a conscious design intent from the authors of the changes. Demonstrating the findings of our
exploratory studies, we build a general model to identify the application of a well-known set of design principles in
new projects.
Our overall results suggest that metric fluctuations have the potential to be relevant indicators for valuable
macroscopic insights about the design evolution in a project's development history
- …