Search CORE

1,213 research outputs found

Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

Author: Feldt Robert
Furia Carlo A.
Gren Lucas
Huang Ziwei
Neto Francisco Gomes de Oliveira
Torkar Richard
Publication venue
Publication date: 01/01/2019
Field of study

Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure

arXiv.org e-Print Archive

Chalmers Research

Diversity in Software Engineering Conferences and Journals

Author: Day Nancy A
Nagappan Meiyappan
Narayanan Aditya Shankar
Vagavolu Dheeraj
Publication venue
Publication date: 24/10/2023
Field of study

Diversity with respect to ethnicity and gender has been studied in open-source and industrial settings for software development. Publication avenues such as academic conferences and journals contribute to the growing technology industry. However, there have been very few diversity-related studies conducted in the context of academia. In this paper, we study the ethnic, gender, and geographical diversity of the authors published in Software Engineering conferences and journals. We provide a systematic quantitative analysis of the diversity of publications and organizing and program committees of three top conferences and two top journals in Software Engineering, which indicates the existence of bias and entry barriers towards authors and committee members belonging to certain ethnicities, gender, and/or geographical locations in Software Engineering conferences and journal publications. For our study, we analyse publication (accepted authors) and committee data (Program and Organizing committee/ Journal Editorial Board) from the conferences ICSE, FSE, and ASE and the journals IEEE TSE and ACM TOSEM from 2010 to 2022. The analysis of the data shows that across participants and committee members, there are some communities that are consistently significantly lower in representation, for example, publications from countries in Africa, South America, and Oceania. However, a correlation study between the diversity of the committees and the participants did not yield any conclusive evidence. Furthermore, there is no conclusive evidence that papers with White authors or male authors were more likely to be cited. Finally, we see an improvement in the ethnic diversity of the authors over the years 2010-2022 but not in gender or geographical diversity.Comment: 13 pages, 10 figures, 4 table

arXiv.org e-Print Archive

Software engineering (Encylopedia entry)

Author: Finkelstein A.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2000
Field of study

UCL Discovery

A Study on the Prevalence of Human Values in Software Engineering Publications, 2015-2018

Author: Hussain Waqar
Mougouei Davoud
Nurwidyantoro Arif
Oliver Gillian
Perera Harsha
Shams Rifat Ara
Whittle Jon
Publication venue
Publication date: 18/07/2019
Field of study

Failure to account for human values in software (e.g., equality and fairness) can result in user dissatisfaction and negative socio-economic impact. Engineering these values in software, however, requires technical and methodological support throughout the development life cycle. This paper investigates to what extent software engineering (SE) research has considered human values. We investigate the prevalence of human values in recent (2015 - 2018) publications at some of the top-tier SE conferences and journals. We classify SE publications, based on their relevance to different values, against a widely used value structure adopted from social sciences. Our results show that: (a) only a small proportion of the publications directly consider values, classified as relevant publications; (b) for the majority of the values, very few or no relevant publications were found; and (c) the prevalence of the relevant publications was higher in SE conferences compared to SE journals. This paper shares these and other insights that motivate research on human values in software engineering

arXiv.org e-Print Archive

Crossref

University of Southern Queensland ePrints

Monash University Research Portal

Software engineering for AI-based systems: A survey

Author: Bogner Justus
Franch Gutiérrez Javier
Martínez Fernández Silverio Juan
Oriol Hilari Marc
Siebert Julien
Trendowicz Adam
Vollmer Anna Maria
Wagner Stefan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2022
Field of study

AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image-, speech-recognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems. To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study. We considered 248 studies published between January 2010 and March 2020. SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges. Our results are valuable for: researchers, to quickly understand the state-of-the-art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula.This work has been partially funded by the “Beatriz Galindo” Spanish Program BEAGAL18/00064 and by the DOGO4ML Spanish research project (ref. PID2020-117191RB-I00)Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

An LTL Semantics of Business Workflows with Recovery

Author: Bersani Marcello M.
Ferrucci Luca
Mazzara Manuel
Publication venue
Publication date: 01/01/2014
Field of study

We describe a business workflow case study with abnormal behavior management (i.e. recovery) and demonstrate how temporal logics and model checking can provide a methodology to iteratively revise the design and obtain a correct-by construction system. To do so we define a formal semantics by giving a compilation of generic workflow patterns into LTL and we use the bound model checker Zot to prove specific properties and requirements validity. The working assumption is that such a lightweight approach would easily fit into processes that are already in place without the need for a radical change of procedures, tools and people's attitudes. The complexity of formalisms and invasiveness of methods have been demonstrated to be one of the major drawback and obstacle for deployment of formal engineering techniques into mundane projects

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Archivio della Ricerca - Università di Pisa

Simplifying Deep-Learning-Based Model for Code Search

Author: Hassan Ahmed E.
Li Shanping
Liu Chao
Liu Zhiwei
Lo David
Xia Xin
Publication venue
Publication date: 28/05/2020
Field of study

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR) based models for code search, which match keywords in query with code text. But they fail to connect the semantic gap between query and code. To conquer this challenge, Gu et al. proposed a deep-learning-based model named DeepCS. It jointly embeds method code and natural language description into a shared vector space, where methods related to a natural language query are retrieved according to their vector similarities. However, DeepCS' working process is complicated and time-consuming. To overcome this issue, we proposed a simplified model CodeMatcher that leverages the IR technique but maintains many features in DeepCS. Generally, CodeMatcher combines query keywords with the original order, performs a fuzzy search on name and body strings of methods, and returned the best-matched methods with the longer sequence of used keywords. We verified its effectiveness on a large-scale codebase with about 41k repositories. Experimental results showed the simplified model CodeMatcher outperforms DeepCS by 97% in terms of MRR (a widely used accuracy measure for code search), and it is over 66 times faster than DeepCS. Besides, comparing with the state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by 73%. We also observed that: fusing the advantages of IR-based and deep-learning-based models is promising because they compensate with each other by nature; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Test case prioritization using test case diversification and fault-proneness estimations

Author: Mahdieh Mohsen
Mahdieh Mostafa
Mirian-Hosseinabadi Seyed-Hassan
Publication venue
Publication date: 18/07/2021
Field of study

Context: Regression testing activities greatly reduce the risk of faulty software release. However, the size of the test suites grows throughout the development process, resulting in time-consuming execution of the test suite and delayed feedback to the software development team. This has urged the need for approaches such as test case prioritization (TCP) and test-suite reduction to reach better results in case of limited resources. In this regard, proposing approaches that use auxiliary sources of data such as bug history can be interesting. Objective: Our aim is to propose an approach for TCP that takes into account test case coverage data, bug history, and test case diversification. To evaluate this approach we study its performance on real-world open-source projects. Method: The bug history is used to estimate the fault-proneness of source code areas. The diversification of test cases is preserved by incorporating fault-proneness on a clustering-based approach scheme. Results: The proposed methods are evaluated on datasets collected from the development history of five real-world projects including 357 versions in total. The experiments show that the proposed methods are superior to coverage-based TCP methods. Conclusion: The proposed approach shows that improvement of coverage-based and fault-proneness based methods is possible by using a combination of diversification and fault-proneness incorporation

arXiv.org e-Print Archive