THE EFFECT OF THE USED RESAMPLING TECHNIQUE AND NUMBER OF SAMPLES IN CONSOLIDATED TREES’ CONSTRUCTION ALGORITHM

Ibai Gurrutxaga; Javier Muguerza; Jesús M. Pérez; José I. Martín; Olatz Arbelaitz

THE EFFECT OF THE USED RESAMPLING TECHNIQUE AND NUMBER OF SAMPLES IN CONSOLIDATED TREES’ CONSTRUCTION ALGORITHM

Authors: Ibai Gurrutxaga
Javier Muguerza
Jesús M. Pérez
José I. Martín
Olatz Arbelaitz
Publication date
Publisher

Abstract

In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated Trees ´ Construction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide analysis of the behavior of CTC algorithm for 20 databases. The effect of two parameters of the algorithm: number of samples and the way subsamples have been built has been analyzed. The results obtained with Consolidated Trees have been compared to C4.5 trees executing 5 times a 10 fold cross validation. The comparison has been done from two points of view: error rate (accuracy) and complexity (explanation). Results show that, for subsamples of 75 % of the training sample, Consolidated Trees achieve, in average, smaller error rates than C4.5 trees when they are built with 10 or more subsamples and with similar complexity, so, they are better situated in the learning curve. On the other hand, the method used to build subsamples clearly affects to the quality of results achieved with Consolidated Trees. If bootstrap samples are used to build trees the obtained results are worse than the ones obtained with subsamples of 75 % from the two points of view: error and complexity

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.98.76...

Last time updated on 23/10/2014