15 research outputs found
Predictive Analytics and Software Defect Severity: A Systematic Review and Future Directions
Software testing identifies defects in software products with varying multiplying effects based on their severity levels and sequel to instant rectifications, hence the rate of a research study in the software engineering domain. In this paper, a systematic literature review (SLR) on machine learning-based software defect severity prediction was conducted in the last decade. The SLR was aimed at detecting germane areas central to efficient predictive analytics, which are seldom captured in existing software defect severity prediction reviews. The germane areas include the analysis of techniques or approaches which have a significant influence on the threats to the validity of proposed models, and the bias-variance tradeoff considerations techniques in data science-based approaches. A population, intervention, and outcome model is adopted for better search terms during the literature selection process, and subsequent quality assurance scrutiny yielded fifty-two primary studies. A subsequent thoroughbred systematic review was conducted on the final selected studies to answer eleven main research questions, which uncovers approaches that speak to the aforementioned germane areas of interest. The results indicate that while the machine learning approach is ubiquitous for predicting software defect severity, germane techniques central to better predictive analytics are infrequent in literature. This study is concluded by summarizing prominent study trends in a mind map to stimulate future research in the software engineering industry.publishedVersio
Explanatory and Causality Analysis in Software Engineering
Software fault proneness and software development efforts are two key areas of software engineering. Improving them will significantly reduce the cost and promote good planning and practice in developing and managing software projects. Traditionally, studies of software fault proneness and software development efforts were focused on analysis and prediction, which can help to answer questions like `when’ and `where’. The focus of this dissertation is on explanatory and causality studies that address questions like `why’ and `how’.
First, we applied a case-control study to explain software fault proneness. We found that Bugfixes (Prerelease bugs), Developers, Code Churn, and Age of a file are the main contributors to the Postrelease bugs in some of the open-source projects. In terms of the interactions, we found that Bugfixes and Developers reduced the risk of post release software faults. The explanatory models were tested for prediction and their performance was either comparable or better than the top-performing classifiers used in related studies. Our results indicate that software project practitioners should pay more attention to the prerelease bug fixing process and the number of Developers assigned, as well as their interaction. Also, they need to pay more attention to the new files (less than one year old) which contributed significantly more to Postrelease bugs more than old files.
Second, we built a model that explains and predicts multiple levels of software development effort and measured the effects of several metrics and their interactions using categorical regression models. The final models for the three data sets used were statistically fit, and performance was comparable to related studies. We found that project size, duration, the existence of any type of faults, the use of first- or second generation of programming languages, and team size significantly increased the software development effort. On the other side, the interactions between duration and defective project, and between duration and team size reduced the software development effort. These results suggest that software practitioners should pay extra attention to the time of the project and the team size assigned for every task because when they increased from a low to a higher level, they significantly increased the software development effort.
Third, a structural equation modeling method was applied for causality analysis of software fault proneness. The method combined statistical and regression analysis to find the direct and indirect causes for software faults using partial least square path modeling method. We found direct and indirect paths from measurement models that led to software postrelease bugs. Specifically, the highest direct effect came from the change request, while changing the code had a minor impact on software faults. The highest impact of the code change resulted from the change requests (either for bug fixing or refactoring). Interestingly, the indirect impact from code characteristics to software fault proneness was higher than the direct impact. We found a similar level of direct and indirect impact from code characteristics to code change
LEVERAGING MACHINE LEARNING TO IDENTIFY QUALITY ISSUES IN THE MEDICAID CLAIM ADJUDICATION PROCESS
Medicaid is the largest health insurance in the U.S. It provides health coverage to over 68
million individuals, costs the nation over $600 billion a year, and subject to improper payments
(fraud, waste, and abuse) or inaccurate payments (claim processed erroneously). Medicaid
programs partially use Fee-For-Services (FFS) to provide coverage to beneficiaries by
adjudicating claims and leveraging traditional inferential statistics to verify the quality of
adjudicated claims. These quality methods only provide an interval estimate of the quality errors
and are incapable of detecting most claim adjudication errors, potentially millions of dollar
opportunity costs. This dissertation studied a method of applying supervised learning to detect
erroneous payment in the entire population of adjudicated claims in each Medicaid Management
Information System (MMIS), focusing on two specific claim types: inpatient and outpatient. A
synthesized source of adjudicated claims generated by the Centers for Medicare & Medicaid
Services (CMS) was used to create the original dataset. Quality reports from California FFS
Medicaid were used to extract the underlying statistical pattern of claim adjudication errors in
each Medicaid FFS and data labeling utilizing the goodness of fit and Anderson-Darling tests.
Principle Component Analysis (PCA) and business knowledge were applied for dimensionality
reduction resulting in the selection of sixteen (16) features for the outpatient and nineteen (19)
features for the inpatient claims models. Ten (10) supervised learning algorithms were trained
and tested on the labeled data: Decision tree with two configurations - Entropy and Gini,
Random forests with two configurations - Entropy and Gini, Naïve Bayes, K Nearest Neighbor,
Logistic Regression, Neural Network, Discriminant Analysis, and Gradient Boosting. Five (5) cross-validation and event-based sampling were applied during the training process (with oversampling using SMOTE method and stratification within oversampling). The prediction power (Gini importance) for the selected features were measured using the Mean Decrease in
Impurity (MDI) method across three algorithms. A one-way ANOVA and Tukey and Fisher LSD
pairwise comparisons were conducted. Results show that the Claim Payment Amount
significantly outperforms the rest of the prediction power (highest Mean F-value for Gini
importance at the α = 0.05 significance) for both claim types. Finally, all algorithms' recall and
F1-score were measured for both claim types (inpatient and outpatient) and with and without
oversampling. A one-way ANOVA and Tukey and Fisher LSD pairwise comparisons were
conducted. The results show a statistically significant difference in the algorithm's performance
in detecting quality issues in the outpatient and inpatient claims. Gradient Boosting, Decision
Tree (with various configurations and sampling strategies) outperform the rest of the algorithms
in recall and F1-measure on both datasets. Logistic Regression showing better recall on the
outpatient than inpatient data, and Naïve Bays performs considerably better from recall and F1-
score on outpatient data. Medicaid FFS programs and consultants, Medicaid administrators, and
researchers could use this study to develop machine learning models to detect quality issues in
the Medicaid FFS claim datasets at scale, saving potentially millions of dollars
Towards the design of efficient error detection mechanisms
The pervasive nature of modern computer systems has led to an increase in our
reliance on such systems to provide correct and timely services. Moreover, as
the functionality of computer systems is being increasingly defined in software,
it is imperative that software be dependable. It has previously been shown that
a fault intolerant software system can be made fault tolerant through the design
and deployment of software mechanisms implementing abstract artefacts known
as error detection mechanisms (EDMs) and error recovery mechanisms (ERMs),
hence the design of these components is central to the design of dependable
software systems. The EDM design problem, which relates to the construction
of a boolean predicate over a set of program variables, is inherently difficult,
with current approaches relying on system specifications and the experience of
software engineers. As this process necessarily entails the identification and
incorporation of program variables by an error detection predicate, this thesis
seeks to address the EDM design problem from a novel variable-centric perspective,
with the research presented supporting the thesis that, where it exists
under the assumed system model, an efficient EDM consists of a set of critical
variables. In particular, this research proposes (i) a metric suite that can
be used to generate a relative ranking of the program variables in a software with respect to their criticality, (ii) a systematic approach for the generation
of highly-efficient error detection predicates for EDMs, and (iii) an approach
for dependability enhancement based on the protection of critical variables using
software wrappers that implement error detection and correction predicates
that are known to be efficient. This research substantiates the thesis that an
efficient EDM contains a set of critical variables on the basis that (i) the proposed
metric suite is able, through application of an appropriate threshold, to
identify critical variables, (ii) efficient EDMs can be constructed based only on
the critical variables identified by the metric suite, and (iii) the criticality of the
identified variables can be shown to extend across a software module such that
an efficient EDM designed for that software module should seek to determine
the correctness of the identified variables
Learning to cope with small noisy data in software effort estimation
Though investigated for decades, Software Effort Estimation (SEE) remains a challenging problem in software project management. However, there are several factors hindering the practical use of SEE models. One major factor is the scarcity of software projects that are used to construct SEE models due to the long process of software development. Even given a large number of projects, the collected effort values are usually corrupted by noise due to the participation of humans. Furthermore, even given enough and noise-free software projects, SEE models may have sensitive parameters to tune possibly causing model sensitivity problem.
The thesis focuses on tackling these three issues. It proposes a synthetic data generator to tackle the data scarcity problem, introduces/constructs uncertain effort estimators to tackle the data noise problem, and analyses the sensitivity to parameter settings of popular SEE models. The main contributions of the thesis include:
1. Propose a synthetic project generator and provide an understanding of when and why it improves prediction performance of what baseline models.
2. Introduce relevance vector machine for uncertain effort estimation.
3. Propose a better uncertain estimation method based on an ensemble strategy.
4. Provide a better understanding of the impact of parameter tuning for SEE methods
Big Code Applications and Approaches
The availability of a huge amount of source code from code archives and open-source projects opens up the possibility to merge machine learning, programming languages, and software engineering research fields. This area
is often referred to as Big Code where programming languages are treated instead of natural languages while different features and patterns of code can be exploited to perform many useful tasks and build supportive tools.
Among all the possible applications which can be developed within the area of Big Code, the work presented in this research thesis mainly focuses on two
particular tasks: the Programming Language Identification (PLI) and the Software Defect Prediction (SDP) for source codes.
Programming language identification is commonly needed in program comprehension and it is usually performed directly by developers. However, when it comes at big scales, such as in widely used archives (GitHub, Software
Heritage), automation of this task is desirable. To
accomplish this aim, the problem is analyzed from different points of view (text and image-based learning approaches) and different models are created paying particular attention to their scalability.
Software defect prediction is a fundamental step in software development for improving quality and assuring the reliability of software products. In the past, defects were searched by manual inspection or using automatic static and dynamic analyzers. Now, the automation of this task can be tackled using learning approaches that can speed up and improve related procedures. Here, two models have been built and analyzed to detect some of the commonest bugs and errors at different code granularity levels (file and method levels).
Exploited data and models’ architectures are analyzed and described in detail. Quantitative and qualitative results are reported for both PLI and SDP tasks while differences and similarities concerning other related works are discussed
Text Similarity Between Concepts Extracted from Source Code and Documentation
Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p
Recommended from our members
Detect and Repair Errors for DNN-based Software
Nowadays, deep neural networks based software have been widely applied in many areas including safety-critical areas such as traffic control, medical diagnosis and malware detection, etc. However, the software engineering techniques, which are supposed to guarantee the functionality, safety as well as fairness, are not well studied. For example, some serious crashes of DNN based autonomous cars have been reported. These crashes could have been avoided if these DNN based software were well tested. Traditional software testing, debugging or repairing techniques do not work well on DNN based software because there is no control flow, data flow or AST(Abstract Syntax Tree) in deep neural networks. Proposing software engineering techniques targeted on DNN based software are imperative. In this thesis, we first introduced the development of SE(Software Engineering) for AI(Artificial Intelligence) area and how our works have influenced the advancement of this new area. Then we summarized related works and some important concepts in SE for AI area. Finally, we discussed four important works of ours.
Our first project DeepTest is one of the first few papers proposing systematic software testing techniques for DNN based software. We proposed neuron coverage guided image synthesis techniques for DNN based autonomous cars and leveraged domain specific metamorphic relation to generate oracle for new generated test cases to automatically test DNN based software. We applied DeepTest to testing three top performing self-driving car models in Udacity self-driving car challenge and our tool has identified thousands of erroneous behaviors that may lead to potential fatal crash.
In DeepTest project, we found that the natural variation such as spatial transformations or rain/fog effects have led to problematic corner cases for DNN based self-driving cars. In the follow-up project DeepRobust, we studied per-point robustness of deep neural network under natural variation. We found that for a DNN model, some specific weak points are more likely to cause erroneous outputs than others under natural variation. We proposed a white-box approach and a black-box approach to identify these weak data points. We implemented and evaluated our approaches on 9 DNN based image classifiers and 3 DNN based self-driving car models. Our approaches can successfully detect weak points with good precision and recall for both DNN based image classifiers and self-driving cars.
Most of existing works in SE for AI area including our DeepTest and DeepRobust focus on instance-wise errors, which are single inputs that result in a DNN model's erroneous outputs. Different from instance-wise errors, group-level errors reflect a DNN model's weak performance on differentiating among certain classes or inconsistent performance across classes. This type of errors is very concerning since it has been found to be related to many real-world notorious errors without malicious attackers. In our third project DeepInspect, we first introduced the group-level errors for DNN based software and categorized them into confusion errors and bias errors based on real-world reports. Then we proposed neuron coverage based distance metric to detect group-level errors for DNN based software without requiring labels. We applied DeepInspect to testing 8 pretrained DNN models trained in 6 popular image classification datasets, including three adversarial trained models. We showed that DeepInspect can successfully detect group-level violations for both single-label and multi-label classification models with high precision.
As a follow-up and more challenging research project, we proposed five WR(weighted regularization) techniques to repair group-level errors for DNN based software. These five different weighted regularization techniques function at different stages of retraining or inference of DNNs including input phase, layer phase, loss phase and output phase. We compared and evaluated these five different WR techniques in both single-label and multi-label classifications including five combinations of four DNN architectures on four datasets. We showed that WR can effectively fix confusion and bias errors and these methods all have their pros, cons and applicable scenario.
All our four projects discussed in this thesis have solved important problems in ensuring the functionality, safety as well as fairness for DNN based software and had significant influence in the advancement of SE for AI area
Combining SOA and BPM Technologies for Cross-System Process Automation
This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation