274 research outputs found
Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development
Mobile devices and platforms have become an established target for modern
software developers due to performant hardware and a large and growing user
base numbering in the billions. Despite their popularity, the software
development process for mobile apps comes with a set of unique, domain-specific
challenges rooted in program comprehension. Many of these challenges stem from
developer difficulties in reasoning about different representations of a
program, a phenomenon we define as a "language dichotomy". In this paper, we
reflect upon the various language dichotomies that contribute to open problems
in program comprehension and development for mobile apps. Furthermore, to help
guide the research community towards effective solutions for these problems, we
provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference
on Program Comprehension (ICPC'18
We Don't Need Another Hero? The Impact of "Heroes" on Software Development
A software project has "Hero Developers" when 80% of contributions are
delivered by 20% of the developers. Are such heroes a good idea? Are too many
heroes bad for software quality? Is it better to have more/less heroes for
different kinds of projects? To answer these questions, we studied 661 open
source projects from Public open source software (OSS) Github and 171 projects
from an Enterprise Github.
We find that hero projects are very common. In fact, as projects grow in
size, nearly all project become hero projects. These findings motivated us to
look more closely at the effects of heroes on software development. Analysis
shows that the frequency to close issues and bugs are not significantly
affected by the presence of project type (Public or Enterprise). Similarly, the
time needed to resolve an issue/bug/enhancement is not affected by heroes or
project type. This is a surprising result since, before looking at the data, we
expected that increasing heroes on a project will slow down howfast that
project reacts to change. However, we do find a statistically significant
association between heroes, project types, and enhancement resolution rates.
Heroes do not affect enhancement resolution rates in Public projects. However,
in Enterprise projects, the more heroes increase the rate at which project
complete enhancements.
In summary, our empirical results call for a revision of a long-held truism
in software engineering. Software heroes are far more common and valuable than
suggested by the literature, particularly for medium to large Enterprise
developments. Organizations should reflect on better ways to find and retain
more of these heroesComment: 8 pages + 1 references, Accepted to International conference on
Software Engineering - Software Engineering in Practice, 201
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
Identifying Patch Correctness in Test-Based Program Repair
Test-based automatic program repair has attracted a lot of attention in
recent years. However, the test suites in practice are often too weak to
guarantee correctness and existing approaches often generate a large number of
incorrect patches.
To reduce the number of incorrect patches generated, we propose a novel
approach that heuristically determines the correctness of the generated
patches. The core idea is to exploit the behavior similarity of test case
executions. The passing tests on original and patched programs are likely to
behave similarly while the failing tests on original and patched programs are
likely to behave differently. Also, if two tests exhibit similar runtime
behavior, the two tests are likely to have the same test results. Based on
these observations, we generate new test inputs to enhance the test suites and
use their behavior similarity to determine patch correctness.
Our approach is evaluated on a dataset consisting of 139 patches generated
from existing program repair systems including jGenProg, Nopol, jKali, ACS and
HDRepair. Our approach successfully prevented 56.3\% of the incorrect patches
to be generated, without blocking any correct patches.Comment: ICSE 201
Scrum2Kanban: Integrating Kanban and Scrum in a University Software Engineering Capstone Course
Using university capstone courses to teach agile software development
methodologies has become commonplace, as agile methods have gained support in
professional software development. This usually means students are introduced
to and work with the currently most popular agile methodology: Scrum. However,
as the agile methods employed in the industry change and are adapted to
different contexts, university courses must follow suit. A prime example of
this is the Kanban method, which has recently gathered attention in the
industry. In this paper, we describe a capstone course design, which adds the
hands-on learning of the lean principles advocated by Kanban into a capstone
project run with Scrum. This both ensures that students are aware of recent
process frameworks and ideas as well as gain a more thorough overview of how
agile methods can be employed in practice. We describe the details of the
course and analyze the participating students' perceptions as well as our
observations. We analyze the development artifacts, created by students during
the course in respect to the two different development methodologies. We
further present a summary of the lessons learned as well as recommendations for
future similar courses. The survey conducted at the end of the course revealed
an overwhelmingly positive attitude of students towards the integration of
Kanban into the course
JUGE: An Infrastructure for Benchmarking Java Unit Test Generators
Researchers and practitioners have designed and implemented various automated
test case generators to support effective software testing. Such generators
exist for various languages (e.g., Java, C#, or Python) and for various
platforms (e.g., desktop, web, or mobile applications). Such generators exhibit
varying effectiveness and efficiency, depending on the testing goals they aim
to satisfy (e.g., unit-testing of libraries vs. system-testing of entire
applications) and the underlying techniques they implement. In this context,
practitioners need to be able to compare different generators to identify the
most suited one for their requirements, while researchers seek to identify
future research directions. This can be achieved through the systematic
execution of large-scale evaluations of different generators. However, the
execution of such empirical evaluations is not trivial and requires a
substantial effort to collect benchmarks, setup the evaluation infrastructure,
and collect and analyse the results. In this paper, we present our JUnit
Generation benchmarking infrastructure (JUGE) supporting generators (e.g.,
search-based, random-based, symbolic execution, etc.) seeking to automate the
production of unit tests for various purposes (e.g., validation, regression
testing, fault localization, etc.). The primary goal is to reduce the overall
effort, ease the comparison of several generators, and enhance the knowledge
transfer between academia and industry by standardizing the evaluation and
comparison process. Since 2013, eight editions of a unit testing tool
competition, co-located with the Search-Based Software Testing Workshop, have
taken place and used and updated JUGE. As a result, an increasing amount of
tools (over ten) from both academia and industry have been evaluated on JUGE,
matured over the years, and allowed the identification of future research
directions
Opinion Mining for Software Development: A Systematic Literature Review
Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies.
SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in
code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take
considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils
these approaches entail.
We conducted a systematic literature review involving 185 papers. More specifically, we present 1) well-defined categories of opinion
mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in
other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4)
concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques.
The results of our study serve as references to choose suitable opinion mining tools for software development activities, and provide
critical insights for the further development of opinion mining techniques in the SE domain
Large-Scale Analysis of Framework-Specific Exceptions in Android Apps
Mobile apps have become ubiquitous. For app developers, it is a key priority
to ensure their apps' correctness and reliability. However, many apps still
suffer from occasional to frequent crashes, weakening their competitive edge.
Large-scale, deep analyses of the characteristics of real-world app crashes can
provide useful insights to guide developers, or help improve testing and
analysis tools. However, such studies do not exist -- this paper fills this
gap. Over a four-month long effort, we have collected 16,245 unique exception
traces from 2,486 open-source Android apps, and observed that
framework-specific exceptions account for the majority of these crashes. We
then extensively investigated the 8,243 framework-specific exceptions (which
took six person-months): (1) identifying their characteristics (e.g.,
manifestation locations, common fault categories), (2) evaluating their
manifestation via state-of-the-art bug detection techniques, and (3) reviewing
their fixes. Besides the insights they provide, these findings motivate and
enable follow-up research on mobile apps, such as bug detection, fault
localization and patch generation. In addition, to demonstrate the utility of
our findings, we have optimized Stoat, a dynamic testing tool, and implemented
ExLocator, an exception localization tool, for Android apps. Stoat is able to
quickly uncover three previously-unknown, confirmed/fixed crashes in Gmail and
Google+; ExLocator is capable of precisely locating the root causes of
identified exceptions in real-world apps. Our substantial dataset is made
publicly available to share with and benefit the community.Comment: ICSE'18: the 40th International Conference on Software Engineerin
- …