Search CORE

149 research outputs found

Software Quality Assessment using Ensemble Models

Author: Aljamaan Hamoud
Publication venue
Publication date: 30/06/2009
Field of study

Data Mining and Machine Learning for Software Engineering

Author: Kiyak Elife Ozturk
Publication venue: 'IntechOpen'
Publication date: 05/03/2020
Field of study

Software engineering is one of the most utilizable research areas for data mining. Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may occur. In the early phases of software development, analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques

IntechOpen

Crossref

A Theoretical Analysis of Why Hybrid Ensembles Work

Author: Kuo-Wei Hsu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles

Crossref

Directory of Open Access Journals

PubMed Central

Improving Defect Prediction Models by Combining Classifiers Predicting Different Defects

Author: Petric Jean
Publication venue
Publication date: 02/10/2018
Field of study

Background: The software industry spends a lot of money on finding and fixing defects. It utilises software defect prediction models to identify code that is likely to be defective. Prediction models have, however, reached a performance bottleneck. Any improvements to prediction models would likely yield less defects-reducing costs for companies. Aim: In this dissertation I demonstrate that different families of classifiers find distinct subsets of defects. I show how this finding can be utilised to design ensemble models which outperform other state-of-the-art software defect prediction models. Method: This dissertation is supported by published work. In the first paper I explore the quality of data which is a prerequisite for building reliable software defect prediction models. The second and third papers explore the ability of different software defect prediction models to find distinct subsets of defects. The fourth paper explores how software defect prediction models can be improved by combining a collection of classifiers that predict different defective components into ensembles. An additional, non-published work, presents a visual technique for the analysis of predictions made by individual classifiers and discusses some possible constraints for classifiers used in software defect prediction. Result: Software defect prediction models created by classifiers of different families predict distinct subsets of defects. Ensembles composed of classifiers belonging to different families outperform other ensemble and standalone models. Only a few highly diverse and accurate base models are needed to compose an effective ensemble. This ensemble can consistently predict a greater number of defects compared to the increase in incorrect predictions. Conclusion: Ensembles should not use the majority-voting techniques to combine decisions of classifiers in software defect prediction as this will miss correct predictions of classifiers which uniquely identify defects. Some classifiers could be less successful for software defect prediction due to complex decision boundaries of defect data. Stacking based ensembles can outperform other ensemble and stand-alone techniques. I propose new possible avenues of research that could further improve the modelling of ensembles in software defect prediction. Data quality should be explicitly considered prior to experiments for researchers to establish reliable results

University of Hertfordshire Research Archive

A Framework for Prognostics Reasoning

Author: Clutz Thomas C.
Publication venue: AFIT Scholar
Publication date: 01/12/2002
Field of study

The use of system data to make predictions about the future system state commonly known as prognostics is a rapidly developing field. Prognostics seeks to build on current diagnostic equipment capabilities for its predictive capability. Many military systems including the Joint Strike Fighter (JSF) are planning to include on-board prognostics systems to enhance system supportability and affordability. Current research efforts supporting these developments tend to focus on developing a prognostic tool for one specific system component. This dissertation research presents a comprehensive literature review of these developing research efforts. It also develops presents a mathematical model for the optimum allocation of prognostics sensors and their associated classifiers on a given system and all of its components. The model assumptions about system criticality are consistent with current industrial philosophies. This research also develops methodologies for combine sensor classifiers to allow for the selection of the best sensor ensemble

AFTI Scholar (Air Force Institute of Technology)

Software defect prediction: do different classifiers find the same defects?

Author: AT Mısırlı
B Turhan
C Catal
C Seiffert
C Soares
D Gray
D Gray
David Bowes
DH Wolpert
E Arisholm
H Chen
I Witten
IH Laradji
Jean Petrić
K Elish
L Briand
L Madeyski
M D’Ambros
M Shepperd
M Shepperd
M Shepperd
MA Hall
N Fenton
NV Chawla
R Malhotra
S Lessmann
T Hall
T Khoshgoftaar
T Menzies
Tracy Hall
U Fayyad
W Chen
Y Zhou
Z Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio

Crossref

Springer - Publisher Connector

Lancaster E-Prints

University of Hertfordshire Research Archive

Recommended from our members

A Systematic Literature Review and Meta-analysis on Cross Project Defect Prediction

Author: Gunarathna D
Hosseini R
Turhan B
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Background: Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. Objective: To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP vs. within project DP models. Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular na¨ıve Bayes yield average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP vs. WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research

Brunel University Research Archive

Full Hierarchic Versus Non-Hierarchic Classification Approaches for Mapping Sealed Surfaces at the Rural-Urban Fringe Using High-Resolution Satellite Data

Author: Baatz
Baatz
Barnsley
Barr
Bauer
Ben-Dor
Benediktsson
Benz
Carleer
Carpenter
Chan
Chen
Chen
Chormanski
Dare
Dietterich
Flanders
Foody
Frank Canters
Friedl
Gigandet
Gong
Gopal
Gurney
Hansen
Haralick
He
Herold
Hodgson
Hu
Huang
Ito
Jensen
Ji
Laha
Lawrence
Lee
Liu
Lu
Myint
Paola
Pesaresi
Quinlan
Rollet
Running
Schueler
Seto
Thomas
Tim De Roeck
Tim Van de Voorde
Van de Voorde
Wang
Weng
Xu
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2009
Field of study

Since 2008 more than half of the world population is living in cities and urban sprawl is continuing. Because of these developments, the mapping and monitoring of urban environments and their surroundings is becoming increasingly important. In this study two object-oriented approaches for high-resolution mapping of sealed surfaces are compared: a standard non-hierarchic approach and a full hierarchic approach using both multi-layer perceptrons and decision trees as learning algorithms. Both methods outperform the standard nearest neighbour classifier, which is used as a benchmark scenario. For the multi-layer perceptron approach, applying a hierarchic classification strategy substantially increases the accuracy of the classification. For the decision tree approach a one-against-all hierarchic classification strategy does not lead to an improvement of classification accuracy compared to the standard all-against-all approach. Best results are obtained with the hierarchic multi-layer perceptron classification strategy, producing a kappa value of 0.77. A simple shadow reclassification procedure based on characteristics of neighbouring objects further increases the kappa value to 0.84

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central