20 research outputs found
Amorphous slicing of extended finite state machines
Slicing is useful for many Software Engineering applications and has been widely studied for three decades, but there has been comparatively little work on slicing Extended Finite State Machines (EFSMs). This paper introduces a set of dependency based EFSM slicing algorithms and an accompanying tool. We demonstrate that our algorithms are suitable for dependence based slicing. We use our tool to conduct experiments on ten EFSMs, including benchmarks and industrial EFSMs. Ours is the first empirical study of dependence based program slicing for EFSMs. Compared to the only previously published dependence based algorithm, our average slice is smaller 40% of the time and larger only 10% of the time, with an average slice size of 35% for termination insensitive slicing
Reducing the Length of Field-replay Based Load Testing
With the development of software, load testing have become more and more important. Load testing can ensure the software system can provide quality service under a certain load. Therefore, one of the common challenges of load testing is to design realistic workloads that can represent the actual workload in the field. In particular, one of the most widely adopted and intuitive approaches is to directly replay the field workloads in the load testing environment, which is resource- and time-consuming.
In this work, we propose an automated approach to reduce the length of load testing that is driven by replaying the field workloads. The intuition of our approach is: if the measured performance associated with a particular system behaviour is already stable, we can skip subsequent testing of this system behaviour to reduce the length of the field workloads. In particular, our approach first clusters execution logs that are generated during the system runtime to identify similar system behaviours during the field workloads. Then, we use statistical methods to determine whether the measured performance associated with a system behaviour has been stable. We evaluate our approach on three open-source projects (i.e., OpenMRS, TeaStore, and Apache James). The results show that our approach can significantly reduce the length of field workloads while the workloads-after-reduction produced by our approach are representative of the original set of workloads. More importantly, the load testing results obtained by replaying the workloads after the reduction have high correlation and similar trend with the original set of workloads. Practitioners can leverage our approach to perform realistic field-replay based load testing while saving the needed resources and time
Personalized First Issue Recommender for Newcomers in Open Source Projects
Many open source projects provide good first issues (GFIs) to attract and
retain newcomers. Although several automated GFI recommenders have been
proposed, existing recommenders are limited to recommending generic GFIs
without considering differences between individual newcomers. However, we
observe mismatches between generic GFIs and the diverse background of
newcomers, resulting in failed attempts, discouraged onboarding, and delayed
issue resolution. To address this problem, we assume that personalized first
issues (PFIs) for newcomers could help reduce the mismatches. To justify the
assumption, we empirically analyze 37 newcomers and their first issues resolved
across multiple projects. We find that the first issues resolved by the same
newcomer share similarities in task type, programming language, and project
domain. These findings underscore the need for a PFI recommender to improve
over state-of-the-art approaches. For that purpose, we identify features that
influence newcomers' personalized selection of first issues by analyzing the
relationship between possible features of the newcomers and the characteristics
of the newcomers' chosen first issues. We find that the expertise preference,
OSS experience, activeness, and sentiment of newcomers drive their personalized
choice of the first issues. Based on these findings, we propose a Personalized
First Issue Recommender (PFIRec), which employs LamdaMART to rank candidate
issues for a given newcomer by leveraging the identified influential features.
We evaluate PFIRec using a dataset of 68,858 issues from 100 GitHub projects.
The evaluation results show that PFIRec outperforms existing first issue
recommenders, potentially doubling the probability that the top recommended
issue is suitable for a specific newcomer and reducing one-third of a
newcomer's unsuccessful attempts to identify suitable first issues, in the
median.Comment: The 38th IEEE/ACM International Conference on Automated Software
Engineering (ASE 2023
Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks.
Information Retrieval (IR) approaches are used to leverage textual or unstructured data generated during the software development process to support various software engineering (SE) tasks (e.g., concept location, traceability link recovery, change impact analysis, etc.). Two of the most important steps for applying IR techniques to support SE tasks are preprocessing the corpus and configuring the IR technique, and these steps can significantly influence the outcome and the amount of effort developers have to spend for these maintenance tasks. We present the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process to support SE tasks. The approach named IR-GA determines the (near) optimal solution to be used for each step of the IR process without requiring any training. We applied IR-GA on three different SE tasks and the results of the study indicate that IR-GA outperforms approaches previously used in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised approach and a combinatorial approach
Characterizing and Diagnosing Architectural Degeneration of Software Systems from Defect Perspective
The architecture of a software system is known to degrade as the system evolves over time due to change upon change, a phenomenon that is termed architectural degeneration. Previous research has focused largely on structural deviations of an architecture from its baseline. However, another angle to observe architectural degeneration is software defects, especially those that are architecturally related. Such an angle has not been scientifically explored until now. Here, we ask two relevant questions: (1) What do defects indicate about architectural degeneration? and (2) How can architectural degeneration be diagnosed from the defect perspective? To answer question (1), we conducted an exploratory case study analyzing defect data over six releases of a large legacy system (of size approximately 20 million source lines of code and age over 20 years). The relevant defects here are those that span multiple components in the system (called multiple-component defects - MCDs). This case study found that MCDs require more changes to fix and are more persistent across development phases and releases than other types of defects. To answer question (2), we developed an approach (called Diagnosing Architectural Degeneration - DAD) from the defect perspective, and validated it in another, confirmatory, case study involving three releases of a commercial system (of size over 1.5 million source lines of code and age over 13 years). This case study found that components of the system tend to persistently have an impact on architectural degeneration over releases. Especially, such impact of a few components is substantially greater than that of other components. These results are new and they add to the current knowledge on architectural degeneration. The key conclusions from these results are: (i) analysis of MCDs is a viable approach to characterizing architectural degeneration; and (ii) a method such as DAD can be developed for diagnosing architectural degeneration
Supporting Text Retrieval Query Formulation In Software Engineering
The text found in software artifacts captures important information. Text Retrieval (TR) techniques have been successfully used to leverage this information. Despite their advantages, the success of TR techniques strongly depends on the textual queries given as input. When poorly chosen queries are used, developers can waste time investigating irrelevant results.
The quality of a query indicates the relevance of the results returned by TR in response to the query and can give an indication if the results are worth investigating or a reformulation of the query should be sought instead. Knowing the quality of the query could lead to time saved when irrelevant results are returned. However, the only way to determine if a query led to the wanted artifacts is by manually inspecting the list of results. This dissertation introduces novel approaches to measure and predict the quality of queries automatically in the context of SE tasks, based on a set of statistical properties of the queries. The approaches are evaluated for the task of concept location in source code. The results reveal that the proposed approaches are able to accurately capture and predict the quality of queries for SE tasks supported by TR.
When a query has low quality, the developer can reformulate it and improve it. However, this is just as hard as formulating the query in the first place. This dissertation presents two approaches for partial and complete automation of the query reformulation process. The semi-automatic approach relies on developer feedback about the relevance of TR results and uses this information to automatically reformulate the query. The automatic approach learns and applies the best reformulation approach for a query and relies on a set of training queries and their statistical properties to achieve this. Both approaches are evaluated for concept location and the results show that the techniques are able to improve the results of the original queries in the majority of the cases.
We expect that on the long run the proposed approaches will contribute directly to the reduction of developer effort and implicitly the reduction of software evolution costs