Search CORE

612,766 research outputs found

A Comparative Study of Threshold-based Feature Selection Techniques

Author: Hulse Jason Van
Khoshgoftaar Taghi M.
Wang Huanjing
Publication venue: TopSCHOLAR®
Publication date: 01/08/2010
Field of study

Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The experiments demonstrate that the choice of a performance metric may significantly influence the results. In this study, we have found four distinct patterns when utilizing eight performance metrics to order 11 threshold-based feature selection techniques. Moreover, performances of the software quality models either improve or remain unchanged despite the removal of over 96% of the software metrics (attributes)

TopSCHOLAR

Crossref

An Empirical Investigation of Filter Attribute Selection Techniques for Software Quality Classification

Author: Gao Kehan
Khoshgoftaar Taghi M.
Wang Huanjing
Publication venue: TopSCHOLAR®
Publication date: 01/08/2009
Field of study

Attribute selection is an important activity in data preprocessing for software quality modeling and other data mining problems. The software quality models have been used to improve the fault detection process. Finding faulty components in a software system during early stages of software development process can lead to a more reliable final product and can reduce development and maintenance costs. It has been shown in some studies that prediction accuracy of the models improves when irrelevant and redundant features are removed from the original data set. In this study, we investigated four filter attribute selection techniques, Automatic Hybrid Search (AHS), Rough Sets (RS), Kolmogorov-Smirnov (KS) and Probabilistic Search (PS) and conducted the experiments by using them on a very large telecommunications software system. In order to evaluate their classification performance on the smaller subsets of attributes selected using different approaches, we built several classification models using five different classifiers. The empirical results demonstrated that by applying an attribution selection approach we can build classification models with an accuracy comparable to that built with a complete set of attributes. Moreover, the smaller subset of attributes has less than 15 percent of the complete set of attributes. Therefore, the metrics collection, model calibration, model validation, and model evaluation times of future software development efforts of similar systems can be significantly reduced. In addition, we demonstrated that our recently proposed attribute selection technique, KS, outperformed the other three attribute selection techniques

TopSCHOLAR

Crossref

Credit Risk Management Using Automatic Machine Learning

Author: Gaweł Bartłomiej
Paliński Andrzej
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2020
Field of study

The article presents the basic techniques of data mining implemented in typical commercial software. They were used to assess the risk of credit card debt repayment. The article assesses the quality of classification models derived from data mining techniques and compares their results with the traditional approach using a logit model to assess credit risk. It turns out that data mining models provide similar accuracy of classification compared to the logit model, but they require much less work and facilitate the automation of the process of building scoring models

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Biblioteka Nauki - repozytorium artykuÅÃ³w

Recommended from our members

Discovering Software Reliability Patterns Based On Multiple Software Projects

Author: Adkins Gerald
Liu Yi
Williams Gita
Yao Jeng-Foung
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2007
Field of study

Discovering patterns that indicate software reliability provides valuable information to software project managers. Software Quality Classification (SQC) modeling is a methodology that can be used to discover reliability patterns of large software projects. However, the patterns found by SQC modeling may not be accurate and robust owing to insufficient information used in the training process. This study compares two genetic programming-based SQC models using different volumes of data. These data were extracted from seven different NASA software projects. The results demonstrate that combining data from different projects can produce more accurate and reliable patterns

CSUSB ScholarWorks

Choosing software metrics for defect prediction: an investigation on feature selection techniques

Author: Aha
Aha
Berenson
Cameron
Chen
Coppin
Dash
Domingos
Fawcett
Fogarty
Forman
Furlanello
Guyon
Hall
Harman
Haykin
Hudepohl
Khoshgoftaar
Khoshgoftaar
Khoshgoftaar
Le Cessie
Lessmann
Liu
Menzies
Pfleeger
Platt
Shawe-Taylor
Tan
Votta
Witten
Wohlin
Publication venue: 'Wiley'
Publication date: 25/04/2011
Field of study

The selection of software metrics for building software quality prediction models is a search-based software engineering problem. An exhaustive search for such metrics is usually not feasible due to limited project resources, especially if the number of available metrics is large. Defect prediction models are necessary in aiding project managers for better utilizing valuable project resources for software quality improvement. The efficacy and usefulness of a fault-proneness prediction model is only as good as the quality of the software measurement data. This study focuses on the problem of attribute selection in the context of software quality estimation. A comparative investigation is presented for evaluating our proposed hybrid attribute selection approach, in which feature ranking is first used to reduce the search space, followed by a feature subset selection. A total of seven different feature ranking techniques are evaluated, while four different feature subset selection approaches are considered. The models are trained using five commonly used classification algorithms. The case study is based on software metrics and defect data collected from multiple releases of a large real-world software system. The results demonstrate that while some feature ranking techniques performed similarly, the automatic hybrid search algorithm performed the best among the feature subset selection methods. Moreover, performances of the defect prediction models either improved or remained unchanged when over 85were eliminated. Copyright © 2011 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/83475/1/1043_ftp.pd

Crossref

Deep Blue Documents at the University of Michigan