Choosing software metrics for defect prediction: an investigation on feature selection techniques

Aha; Aha; Berenson; Cameron; Chen; Coppin; Dash; Domingos; Fawcett; Fogarty; Forman; Furlanello; Guyon; Hall; Harman; Haykin; Hudepohl; Khoshgoftaar; Khoshgoftaar; Khoshgoftaar; Le Cessie; Lessmann; Liu; Menzies; Pfleeger; Platt; Shawe-Taylor; Tan; Votta; Witten; Wohlin

Choosing software metrics for defect prediction: an investigation on feature selection techniques

Authors: Aha
Aha
Berenson
Cameron
Chen
Coppin
Dash
Domingos
Fawcett
Fogarty
Forman
Furlanello
Guyon
Hall
Harman
Haykin
Hudepohl
Khoshgoftaar
Khoshgoftaar
Khoshgoftaar
Le Cessie
Lessmann
Liu
Menzies
Pfleeger
Platt
Shawe-Taylor
Tan
Votta
Witten
Wohlin
Publication date: 25 April 2011
Publisher: 'Wiley'
Doi

Abstract

The selection of software metrics for building software quality prediction models is a search-based software engineering problem. An exhaustive search for such metrics is usually not feasible due to limited project resources, especially if the number of available metrics is large. Defect prediction models are necessary in aiding project managers for better utilizing valuable project resources for software quality improvement. The efficacy and usefulness of a fault-proneness prediction model is only as good as the quality of the software measurement data. This study focuses on the problem of attribute selection in the context of software quality estimation. A comparative investigation is presented for evaluating our proposed hybrid attribute selection approach, in which feature ranking is first used to reduce the search space, followed by a feature subset selection. A total of seven different feature ranking techniques are evaluated, while four different feature subset selection approaches are considered. The models are trained using five commonly used classification algorithms. The case study is based on software metrics and defect data collected from multiple releases of a large real-world software system. The results demonstrate that while some feature ranking techniques performed similarly, the automatic hybrid search algorithm performed the best among the feature subset selection methods. Moreover, performances of the defect prediction models either improved or remained unchanged when over 85were eliminated. Copyright © 2011 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/83475/1/1043_ftp.pd

Similar works

Full text

Available Versions

Crossref

Last time updated on 25/03/2021

Deep Blue at the University of Michigan

oai:deepblue.lib.umich.edu:202...

Last time updated on 25/05/2012