Search CORE

189 research outputs found

Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Author: Albrecht
Austin
Baird
Batista
Boehm
Boehm
Breiman
Briand
Briand
Briand
Brockmeier
Cartwright
Cheung
Clark
Feelders
Finnie
Gama
Gray
Holte
Jain
Jeffery
Jun Liu
Jönsson
Kemerer
Khotanzad
Kibler
Kim
Kitchenham
Kohavi
Little
Little
Little
Little
Little
Martin Shepperd
Miranda
Myrtveit
Pickard
Putnam
Qinbao Song
Quinlan
Robins
Rubin
Rubin
Rubin
Rubin
Samson
Selby
Shao
Shepperd
Shepperd
Siedelecki
Song
Song
Srinivasan
Strike
Tabachnick
Tay
Walkerden
Walston
Xiangru Chen
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

Crossref

Brunel University Research Archive

Recommended from our members

Augmenting Naive Bayes Classifiers with Statistical Language Models

Author: Peng Fuchun
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2003
Field of study

We augment naive Bayes models with statistical n-gram language models to address short- comings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier which allows for a local Markov dependence among observations; a model we re- fer to as the Chain Augmented Naive Bayes (CAN) Bayes classifier. CAN models have two advantages over standard naive Bayes classifiers. First, they relax some of the indepen- dence assumptions of naive Bayes—allowing a local Markov chain dependence in the observed variables—while still permitting efficient inference and learning. Second, they permit straight- forward application of sophisticated smoothing techniques from statistical language modeling, which allows one to obtain better parameter estimates than the standard Laplace smoothing used in naive Bayes classification. In this paper, we introduce CAN models and apply them to various text classification problems. To demonstrate the language independent and task independent nature of these classifiers, we present experimental results on several text clas- sification problems—authorship attribution, text genre classification, and topic detection—in several languages—Greek, English, Japanese and Chinese. We then systematically study the key factors in the CAN model that can influence the classification performance, and analyze the strengths and weaknesses of the model

ScholarWorks@UMass Amherst

Behavioral Hierarchy: Exploration and Representation

Author: A. G. Barto
A. G. Barto
A. Jonsson
A. Jonsson
A. McGovern
A. Newell
B. Bakker
B. C. Silva da
B. Digney
B. Hengst
C. Boutilier
C. Guestrin
D. A. Waterman
D. Heckerman
D. W. Schneider
E. D. Sacerdoti
G. A. Miller
G. J. Tesauro
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
H. A. Simon
H. A. Simon
H. Seijen van
H. Steck
I. Menache
J. Gibson
J. Mugan
J. Pearl
J. R. Anderson
J. Schmidhuber
K. Murphy
K. S. Lashley
L. Torrey
M. E. Taylor
M. E. Taylor
M. Huber
M. M. Botvinick
M. M. Botvinick
M. Pickett
N. Friedman
N. Mehta
P. Langley
R. Alur
R. E. Bellman
R. E. Fikes
R. E. Korf
R. M. Ryan
R. Parr
R. R. Burridge
R. S. Sutton
R. S. Sutton
R. Tedrake
R. Tedrake
R. W. White
S. B. Thrun
S. Hart
S. Mahadevan
S. Mannor
S. Singh
S. Tong
T. G. Dietterich
T. G. Dietterich
T. L. Dean
W. Buntine
W. Callebaut
Y. Liu
Ö. Şimşek
Ö. Şimşek
Ö. Şimşek
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Deep Learning in Visual Computing and Signal Processing

Author: Danfeng Xie
Lei Zhang
Li Bai
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref

Closed-Loop Learning of Visual Control Policies

Author: Jodogne S. R.
Piater J. H.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2007
Field of study

In this paper we present a general, flexible framework for learning mappings from images to actions by interacting with the environment. The basic idea is to introduce a feature-based image classifier in front of a reinforcement learning algorithm. The classifier partitions the visual space according to the presence or absence of few highly informative local descriptors that are incrementally selected in a sequence of attempts to remove perceptual aliasing. We also address the problem of fighting overfitting in such a greedy algorithm. Finally, we show how high-level visual features can be generated when the power of local descriptors is insufficient for completely disambiguating the aliased states. This is done by building a hierarchy of composite features that consist of recursive spatial combinations of visual features. We demonstrate the efficacy of our algorithms by solving three visual navigation tasks and a visual version of the classical Car on the Hill control problem

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Repository and Bibliography - Liège

DIAL UCLouvain

A concept drift-tolerant case-base editing technique

Author: Lopez De Mantaras R
Lu J
Lu N
Zhang G
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

© 2015 Elsevier B.V. All rights reserved. The evolving nature and accumulating volume of real-world data inevitably give rise to the so-called "concept drift" issue, causing many deployed Case-Based Reasoning (CBR) systems to require additional maintenance procedures. In Case-base Maintenance (CBM), case-base editing strategies to revise the case-base have proven to be effective instance selection approaches for handling concept drift. Motivated by current issues related to CBR techniques in handling concept drift, we present a two-stage case-base editing technique. In Stage 1, we propose a Noise-Enhanced Fast Context Switch (NEFCS) algorithm, which targets the removal of noise in a dynamic environment, and in Stage 2, we develop an innovative Stepwise Redundancy Removal (SRR) algorithm, which reduces the size of the case-base by eliminating redundancies while preserving the case-base coverage. Experimental evaluations on several public real-world datasets show that our case-base editing technique significantly improves accuracy compared to other case-base editing approaches on concept drift tasks, while preserving its effectiveness on static tasks

OPUS - University of Technology Sydney

Digital.CSIC

Workshop on Rich Representations for Reinforcement Learning:Held in conjunction with the 22nd International Conference on Machine Learning, August 7, 2005, Bonn, Germany

Author: Driessens Kurt
Fern Alan
van Otterlo Martijn
Publication venue: University of Bonn
Publication date: 01/01/2005
Field of study

University of Twente Research Information