Search CORE

20 research outputs found

Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports

Author: LO David
ZHANG Hongyu
ZHOU Jian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

NSF

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

Feature location in source code: a taxonomy and survey

Author: Agrawal
Antoniol
Binkley
Binkley
Blei
Bohnet
Cleary
Comon
Cornelissen
Cubranic
Deerwester
Eaddy
Edwards
Egyed
Egyed
Egyed
Eisenbarth
Furnas
Greevy
Kleinberg
Ko
Lukins
Petrenko
Poshyvanyk
Rajlich
Robillard
Robillard
Robillard
Robillard
Rohatgi
Sahner
Salton
Sartipi
Schach
Simmons
Somerville
Wilde
Wilde
Zhai
Zhao
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Amorphous slicing of extended finite state machines

Author: Androutsopoulos K
Clark D
Harman M
Hierons RM
Li Z
Tratt L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Slicing is useful for many Software Engineering applications and has been widely studied for three decades, but there has been comparatively little work on slicing Extended Finite State Machines (EFSMs). This paper introduces a set of dependency based EFSM slicing algorithms and an accompanying tool. We demonstrate that our algorithms are suitable for dependence based slicing. We use our tool to conduct experiments on ten EFSMs, including benchmarks and industrial EFSMs. Ours is the first empirical study of dependence based program slicing for EFSMs. Compared to the only previously published dependence based algorithm, our average slice is smaller 40% of the time and larger only 10% of the time, with an average slice size of 35% for termination insensitive slicing

Crossref

UCL Discovery

Middlesex University Research Repository

King's Research Portal

Brunel University Research Archive

Reducing the Length of Field-replay Based Load Testing

Author: Xia Yuanjie
Publication venue
Publication date: 18/08/2021
Field of study

With the development of software, load testing have become more and more important. Load testing can ensure the software system can provide quality service under a certain load. Therefore, one of the common challenges of load testing is to design realistic workloads that can represent the actual workload in the field. In particular, one of the most widely adopted and intuitive approaches is to directly replay the field workloads in the load testing environment, which is resource- and time-consuming. In this work, we propose an automated approach to reduce the length of load testing that is driven by replaying the field workloads. The intuition of our approach is: if the measured performance associated with a particular system behaviour is already stable, we can skip subsequent testing of this system behaviour to reduce the length of the field workloads. In particular, our approach first clusters execution logs that are generated during the system runtime to identify similar system behaviours during the field workloads. Then, we use statistical methods to determine whether the measured performance associated with a system behaviour has been stable. We evaluate our approach on three open-source projects (i.e., OpenMRS, TeaStore, and Apache James). The results show that our approach can significantly reduce the length of field workloads while the workloads-after-reduction produced by our approach are representative of the original set of workloads. More importantly, the load testing results obtained by replaying the workloads after the reduction have high correlation and similar trend with the original set of workloads. Practitioners can leverage our approach to perform realistic field-replay based load testing while saving the needed resources and time

Concordia University Research Repository

Personalized First Issue Recommender for Newcomers in Open Source Projects

Author: He Hao
Li Jingyue
Qiu Ruiqiao
Xiao Wenxin
Zhou Minghui
Publication venue
Publication date: 17/08/2023
Field of study

Many open source projects provide good first issues (GFIs) to attract and retain newcomers. Although several automated GFI recommenders have been proposed, existing recommenders are limited to recommending generic GFIs without considering differences between individual newcomers. However, we observe mismatches between generic GFIs and the diverse background of newcomers, resulting in failed attempts, discouraged onboarding, and delayed issue resolution. To address this problem, we assume that personalized first issues (PFIs) for newcomers could help reduce the mismatches. To justify the assumption, we empirically analyze 37 newcomers and their first issues resolved across multiple projects. We find that the first issues resolved by the same newcomer share similarities in task type, programming language, and project domain. These findings underscore the need for a PFI recommender to improve over state-of-the-art approaches. For that purpose, we identify features that influence newcomers' personalized selection of first issues by analyzing the relationship between possible features of the newcomers and the characteristics of the newcomers' chosen first issues. We find that the expertise preference, OSS experience, activeness, and sentiment of newcomers drive their personalized choice of the first issues. Based on these findings, we propose a Personalized First Issue Recommender (PFIRec), which employs LamdaMART to rank candidate issues for a given newcomer by leveraging the identified influential features. We evaluate PFIRec using a dataset of 68,858 issues from 100 GitHub projects. The evaluation results show that PFIRec outperforms existing first issue recommenders, potentially doubling the probability that the top recommended issue is suitable for a specific newcomer and reducing one-third of a newcomer's unsuccessful attempts to identify suitable first issues, in the median.Comment: The 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

arXiv.org e-Print Archive

Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks.

Author: Dit Bogdan
Publication venue: W&M ScholarWorks
Publication date: 01/01/2015
Field of study

Information Retrieval (IR) approaches are used to leverage textual or unstructured data generated during the software development process to support various software engineering (SE) tasks (e.g., concept location, traceability link recovery, change impact analysis, etc.). Two of the most important steps for applying IR techniques to support SE tasks are preprocessing the corpus and configuring the IR technique, and these steps can significantly influence the outcome and the amount of effort developers have to spend for these maintenance tasks. We present the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process to support SE tasks. The approach named IR-GA determines the (near) optimal solution to be used for each step of the IR process without requiring any training. We applied IR-GA on three different SE tasks and the results of the study indicate that IR-GA outperforms approaches previously used in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised approach and a combinatorial approach

Boise State University - ScholarWorks

College of William & Mary: W&M Publish

Characterizing and Diagnosing Architectural Degeneration of Software Systems from Defect Perspective

Author: Li Zude
Publication venue: Scholarship@Western
Publication date: 29/10/2010
Field of study

The architecture of a software system is known to degrade as the system evolves over time due to change upon change, a phenomenon that is termed architectural degeneration. Previous research has focused largely on structural deviations of an architecture from its baseline. However, another angle to observe architectural degeneration is software defects, especially those that are architecturally related. Such an angle has not been scientifically explored until now. Here, we ask two relevant questions: (1) What do defects indicate about architectural degeneration? and (2) How can architectural degeneration be diagnosed from the defect perspective? To answer question (1), we conducted an exploratory case study analyzing defect data over six releases of a large legacy system (of size approximately 20 million source lines of code and age over 20 years). The relevant defects here are those that span multiple components in the system (called multiple-component defects - MCDs). This case study found that MCDs require more changes to fix and are more persistent across development phases and releases than other types of defects. To answer question (2), we developed an approach (called Diagnosing Architectural Degeneration - DAD) from the defect perspective, and validated it in another, confirmatory, case study involving three releases of a commercial system (of size over 1.5 million source lines of code and age over 13 years). This case study found that components of the system tend to persistently have an impact on architectural degeneration over releases. Especially, such impact of a few components is substantially greater than that of other components. These results are new and they add to the current knowledge on architectural degeneration. The key conclusions from these results are: (i) analysis of MCDs is a viable approach to characterizing architectural degeneration; and (ii) a method such as DAD can be developed for diagnosing architectural degeneration

Scholarship@Western

Reverse engineering source code:Empirical studies of limitations and opportunities

Author: Landman D.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Supporting Text Retrieval Query Formulation In Software Engineering

Author: Haiduc Sonia Cristina
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2013
Field of study

The text found in software artifacts captures important information. Text Retrieval (TR) techniques have been successfully used to leverage this information. Despite their advantages, the success of TR techniques strongly depends on the textual queries given as input. When poorly chosen queries are used, developers can waste time investigating irrelevant results. The quality of a query indicates the relevance of the results returned by TR in response to the query and can give an indication if the results are worth investigating or a reformulation of the query should be sought instead. Knowing the quality of the query could lead to time saved when irrelevant results are returned. However, the only way to determine if a query led to the wanted artifacts is by manually inspecting the list of results. This dissertation introduces novel approaches to measure and predict the quality of queries automatically in the context of SE tasks, based on a set of statistical properties of the queries. The approaches are evaluated for the task of concept location in source code. The results reveal that the proposed approaches are able to accurately capture and predict the quality of queries for SE tasks supported by TR. When a query has low quality, the developer can reformulate it and improve it. However, this is just as hard as formulating the query in the first place. This dissertation presents two approaches for partial and complete automation of the query reformulation process. The semi-automatic approach relies on developer feedback about the relevance of TR results and uses this information to automatically reformulate the query. The automatic approach learns and applies the best reformulation approach for a query and relies on a set of training queries and their statistical properties to achieve this. Both approaches are evaluated for concept location and the results show that the techniques are able to improve the results of the original queries in the majority of the cases. We expect that on the long run the proposed approaches will contribute directly to the reduction of developer effort and implicitly the reduction of software evolution costs

Digital Commons@Wayne State University