1,213 research outputs found
Evolution of statistical analysis in empirical software engineering research: Current state and steps forward
Software engineering research is evolving and papers are increasingly based
on empirical data from a multitude of sources, using statistical tests to
determine if and to what degree empirical evidence supports their hypotheses.
To investigate the practices and trends of statistical analysis in empirical
software engineering (ESE), this paper presents a review of a large pool of
papers from top-ranked software engineering journals. First, we manually
reviewed 161 papers and in the second phase of our method, we conducted a more
extensive semi-automatic classification of papers spanning the years 2001--2015
and 5,196 papers. Results from both review steps was used to: i) identify and
analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well
as relevant trends in usage of specific statistical methods (e.g.,
nonparametric tests and effect size measures) and, ii) develop a conceptual
model for a statistical analysis workflow with suggestions on how to apply
different statistical methods as well as guidelines to avoid pitfalls. Lastly,
we confirm existing claims that current ESE practices lack a standard to report
practical significance of results. We illustrate how practical significance can
be discussed in terms of both the statistical analysis and in the
practitioner's context.Comment: journal submission, 34 pages, 8 figure
Diversity in Software Engineering Conferences and Journals
Diversity with respect to ethnicity and gender has been studied in
open-source and industrial settings for software development. Publication
avenues such as academic conferences and journals contribute to the growing
technology industry. However, there have been very few diversity-related
studies conducted in the context of academia. In this paper, we study the
ethnic, gender, and geographical diversity of the authors published in Software
Engineering conferences and journals. We provide a systematic quantitative
analysis of the diversity of publications and organizing and program committees
of three top conferences and two top journals in Software Engineering, which
indicates the existence of bias and entry barriers towards authors and
committee members belonging to certain ethnicities, gender, and/or geographical
locations in Software Engineering conferences and journal publications. For our
study, we analyse publication (accepted authors) and committee data (Program
and Organizing committee/ Journal Editorial Board) from the conferences ICSE,
FSE, and ASE and the journals IEEE TSE and ACM TOSEM from 2010 to 2022. The
analysis of the data shows that across participants and committee members,
there are some communities that are consistently significantly lower in
representation, for example, publications from countries in Africa, South
America, and Oceania. However, a correlation study between the diversity of the
committees and the participants did not yield any conclusive evidence.
Furthermore, there is no conclusive evidence that papers with White authors or
male authors were more likely to be cited. Finally, we see an improvement in
the ethnic diversity of the authors over the years 2010-2022 but not in gender
or geographical diversity.Comment: 13 pages, 10 figures, 4 table
A Study on the Prevalence of Human Values in Software Engineering Publications, 2015-2018
Failure to account for human values in software (e.g., equality and fairness)
can result in user dissatisfaction and negative socio-economic impact.
Engineering these values in software, however, requires technical and
methodological support throughout the development life cycle. This paper
investigates to what extent software engineering (SE) research has considered
human values. We investigate the prevalence of human values in recent (2015 -
2018) publications at some of the top-tier SE conferences and journals. We
classify SE publications, based on their relevance to different values, against
a widely used value structure adopted from social sciences. Our results show
that: (a) only a small proportion of the publications directly consider values,
classified as relevant publications; (b) for the majority of the values, very
few or no relevant publications were found; and (c) the prevalence of the
relevant publications was higher in SE conferences compared to SE journals.
This paper shares these and other insights that motivate research on human
values in software engineering
Software engineering for AI-based systems: A survey
AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image-, speech-recognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems.
To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study.
We considered 248 studies published between January 2010 and March 2020.
SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges.
Our results are valuable for: researchers, to quickly understand the state-of-the-art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula.This work has been partially funded by the “Beatriz Galindo” Spanish Program BEAGAL18/00064 and by the DOGO4ML Spanish research project (ref. PID2020-117191RB-I00)Peer ReviewedPostprint (author's final draft
An LTL Semantics of Business Workflows with Recovery
We describe a business workflow case study with abnormal behavior management
(i.e. recovery) and demonstrate how temporal logics and model checking can
provide a methodology to iteratively revise the design and obtain a correct-by
construction system. To do so we define a formal semantics by giving a
compilation of generic workflow patterns into LTL and we use the bound model
checker Zot to prove specific properties and requirements validity. The working
assumption is that such a lightweight approach would easily fit into processes
that are already in place without the need for a radical change of procedures,
tools and people's attitudes. The complexity of formalisms and invasiveness of
methods have been demonstrated to be one of the major drawback and obstacle for
deployment of formal engineering techniques into mundane projects
Simplifying Deep-Learning-Based Model for Code Search
To accelerate software development, developers frequently search and reuse
existing code snippets from a large-scale codebase, e.g., GitHub. Over the
years, researchers proposed many information retrieval (IR) based models for
code search, which match keywords in query with code text. But they fail to
connect the semantic gap between query and code. To conquer this challenge, Gu
et al. proposed a deep-learning-based model named DeepCS. It jointly embeds
method code and natural language description into a shared vector space, where
methods related to a natural language query are retrieved according to their
vector similarities. However, DeepCS' working process is complicated and
time-consuming. To overcome this issue, we proposed a simplified model
CodeMatcher that leverages the IR technique but maintains many features in
DeepCS. Generally, CodeMatcher combines query keywords with the original order,
performs a fuzzy search on name and body strings of methods, and returned the
best-matched methods with the longer sequence of used keywords. We verified its
effectiveness on a large-scale codebase with about 41k repositories.
Experimental results showed the simplified model CodeMatcher outperforms DeepCS
by 97% in terms of MRR (a widely used accuracy measure for code search), and it
is over 66 times faster than DeepCS. Besides, comparing with the
state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by
73%. We also observed that: fusing the advantages of IR-based and
deep-learning-based models is promising because they compensate with each other
by nature; improving the quality of method naming helps code search, since
method name plays an important role in connecting query and code
Test case prioritization using test case diversification and fault-proneness estimations
Context: Regression testing activities greatly reduce the risk of faulty
software release. However, the size of the test suites grows throughout the
development process, resulting in time-consuming execution of the test suite
and delayed feedback to the software development team. This has urged the need
for approaches such as test case prioritization (TCP) and test-suite reduction
to reach better results in case of limited resources. In this regard, proposing
approaches that use auxiliary sources of data such as bug history can be
interesting.
Objective: Our aim is to propose an approach for TCP that takes into account
test case coverage data, bug history, and test case diversification. To
evaluate this approach we study its performance on real-world open-source
projects.
Method: The bug history is used to estimate the fault-proneness of source
code areas. The diversification of test cases is preserved by incorporating
fault-proneness on a clustering-based approach scheme.
Results: The proposed methods are evaluated on datasets collected from the
development history of five real-world projects including 357 versions in
total. The experiments show that the proposed methods are superior to
coverage-based TCP methods.
Conclusion: The proposed approach shows that improvement of coverage-based
and fault-proneness based methods is possible by using a combination of
diversification and fault-proneness incorporation
- …