18 research outputs found
We Don't Need Another Hero? The Impact of "Heroes" on Software Development
A software project has "Hero Developers" when 80% of contributions are
delivered by 20% of the developers. Are such heroes a good idea? Are too many
heroes bad for software quality? Is it better to have more/less heroes for
different kinds of projects? To answer these questions, we studied 661 open
source projects from Public open source software (OSS) Github and 171 projects
from an Enterprise Github.
We find that hero projects are very common. In fact, as projects grow in
size, nearly all project become hero projects. These findings motivated us to
look more closely at the effects of heroes on software development. Analysis
shows that the frequency to close issues and bugs are not significantly
affected by the presence of project type (Public or Enterprise). Similarly, the
time needed to resolve an issue/bug/enhancement is not affected by heroes or
project type. This is a surprising result since, before looking at the data, we
expected that increasing heroes on a project will slow down howfast that
project reacts to change. However, we do find a statistically significant
association between heroes, project types, and enhancement resolution rates.
Heroes do not affect enhancement resolution rates in Public projects. However,
in Enterprise projects, the more heroes increase the rate at which project
complete enhancements.
In summary, our empirical results call for a revision of a long-held truism
in software engineering. Software heroes are far more common and valuable than
suggested by the literature, particularly for medium to large Enterprise
developments. Organizations should reflect on better ways to find and retain
more of these heroesComment: 8 pages + 1 references, Accepted to International conference on
Software Engineering - Software Engineering in Practice, 201
Role of Newcomers Supportive Strategies on Socio-Technical Performance of Open Source Projects
The success of open source software (OSS) projects have been studied in previous research. This paper focused on the effect of newcomers’ supportive strategies in OSS projects on the success level of the projects. Our research analyzes the socio-technical commitment to the project as a proxy for success. Data about 453 OSS projects from GitHub.com is collected and analyzed to empirically test the research model. We have applied a clustering technique to explore the dataset attributes. Results show the importance of newcomers’ supportive strategies on the different socio-technical aspects of OSS projects’ leading to success. Also, we have tested the effect of programming language diversity and project profile health on the success of projects. The outcome of this study has both managerial and practical implications
Open Source Software Information Triangulation: A Design Science Study
Open source components are a promising way for creating and delivering software to the market fast. However, challenges arise when assessing the quality of open source software. While frameworks to assess these components exist, the open source market is neither governed nor regulated and the use of these frameworks is labor-intensive and complex. This research aims to solve this problem by selecting quality indicators for open source software on GitHub and realizing a tool for automatically supporting the evaluation of information about open source software from other available sources. These sources include StackExchange.com for external support and the National Vulnerability and Exposure database for security incident history. Feedback on the developed prototype supports our view that automatic checks of open source software claims is possible and useful
Associating Natural Language Comment and Source Code Entities
Comments are an integral part of software development; they are natural
language descriptions associated with source code elements. Understanding
explicit associations can be useful in improving code comprehensibility and
maintaining the consistency between code and comments. As an initial step
towards this larger goal, we address the task of associating entities in
Javadoc comments with elements in Java source code. We propose an approach for
automatically extracting supervised data using revision histories of open
source projects and present a manually annotated evaluation dataset for this
task. We develop a binary classifier and a sequence labeling model by crafting
a rich feature set which encompasses various aspects of code, comments, and the
relationships between them. Experiments show that our systems outperform
several baselines learning from the proposed supervision.Comment: Accepted in AAAI 202
A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects
Background: Meeting the growing industry demand for Data Science requires
cross-disciplinary teams that can translate machine learning research into
production-ready code. Software engineering teams value adherence to coding
standards as an indication of code readability, maintainability, and developer
expertise. However, there are no large-scale empirical studies of coding
standards focused specifically on Data Science projects. Aims: This study
investigates the extent to which Data Science projects follow code standards.
In particular, which standards are followed, which are ignored, and how does
this differ to traditional software projects? Method: We compare a corpus of
1048 Open-Source Data Science projects to a reference group of 1099 non-Data
Science projects with a similar level of quality and maturity. Results: Data
Science projects suffer from a significantly higher rate of functions that use
an excessive numbers of parameters and local variables. Data Science projects
also follow different variable naming conventions to non-Data Science projects.
Conclusions: The differences indicate that Data Science codebases are distinct
from traditional software codebases and do not follow traditional software
engineering conventions. Our conjecture is that this may be because traditional
software engineering conventions are inappropriate in the context of Data
Science projects.Comment: 11 pages, 7 figures. To appear in ESEM 2020. Updated based on peer
revie
Deep Just-In-Time Inconsistency Detection Between Comments and Source Code
Natural language comments convey key aspects of source code such as
implementation, usage, and pre- and post-conditions. Failure to update comments
accordingly when the corresponding code is modified introduces inconsistencies,
which is known to lead to confusion and software bugs. In this paper, we aim to
detect whether a comment becomes inconsistent as a result of changes to the
corresponding body of code, in order to catch potential inconsistencies
just-in-time, i.e., before they are committed to a code base. To achieve this,
we develop a deep-learning approach that learns to correlate a comment with
code changes. By evaluating on a large corpus of comment/code pairs spanning
various comment types, we show that our model outperforms multiple baselines by
significant margins. For extrinsic evaluation, we show the usefulness of our
approach by combining it with a comment update model to build a more
comprehensive automatic comment maintenance system which can both detect and
resolve inconsistent comments based on code changes.Comment: Accepted in AAAI 202
Project Evaluation Module for Open Code Analyzer
Cílem této diplomové práce je návrh a implementace modulu pro ohodnocení projektů s otevřeným
zdrojovým kódem. V úvodu práce nastiňuje možnosti vyhodnocení kvality projektů. Na základě dat
a možností služby GitHub a aplikace SonarQube práce dále navrhuje způsob, kterým lze objektivně
vyhodnotit kvalitu projektu pomocí vytvořených metrik. Následně popisuje implementaci a metody,
využité pro sestavení tohoto hodnotícího modelu. Navržené řešení je následně demonstrováno na
množině vybraných projektů. Práce je uzavřena provedením experimentů, které ověřují hypotézy
vzniklé za vývoje prvotního řešení, nebo nabízejí alternativy k zvolenému řešeníThe aim of this diploma thesis is the design and implementation of a module for evaluation of open
source projects. The introduction outlines the possibilities of evaluating the quality of projects.
Based on the data and capabilities of the GitHub service and the SonarQube application, the thesis
further proposes a way in which the quality of the project can be objectively evaluated using the
created metrics. It then describes the implementation and methods used to build this evaluation
model. The proposed solution is then demonstrated on a set of selected projects. The work is
concluded by performing experiments that verify the hypotheses created during the development of
the initial solution, or offer alternatives to the chosen solution.460 - Katedra informatikyvýborn