Search CORE

5 research outputs found

A comparison of pruning criteria for probability trees

Author: Blockeel Hendrik
Bruynooghe Maurice
Fierens Daan
Ramon Jan
Publication venue
Publication date: 01/05/2006
Field of study

status: publishe

Lirias

A comparison of pruning criteria for probability trees

Author: Blockeel Hendrik
Bruynooghe Maurice
Fierens Daan
Ramon Jan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Probability trees are decision trees that predict class probabilities rather than the most likely class. The pruning criterion used to learn a probability tree strongly influences the size of the tree and thereby also the quality of its probability estimates. While the effect of pruning criteria on classification accuracy is well-studied, only recently there is more interest in the effect on probability estimates. Hence, it is currently unclear which pruning criteria for probability trees are preferable under which circumstances. In this paper we survey six of the most important pruning criteria for probability trees, and discuss their theoretical advantages and disadvantages. We also perform an extensive experimental study of the relative performance of these pruning criteria. The main conclusion is that overall a pruning criterion based on randomization tests performs best because it is most robust to extreme data characteristics (such as class skew or a high number of classes). We also identify and explain several shortcomings of the other pruning criteria.status: publishe

Lirias

A comparison of pruning criteria for probability trees

Author: Blockeel Hendrik
Bruynooghe Maurice
Fierens Daan
Ramon Jan
Publication venue: Department of Computer Science, Katholieke Universiteit Leuven
Publication date: 01/04/2007
Field of study

Probability trees (or Probability Estimation Trees, PET's) are decision trees with probability distributions in the leaves. Usually decision trees are learned in a top-down manner with pre- or postpruning. Which pruning criterion is used strongly influences the size of the resulting tree and the quality of the probability estimates. While the effect of pruning criteria on classification accuracy is well-studied, only recently there is more interest in the effect on probability estimates or probability-based rankings. Hence, there is currently no clear view on the relative performance of all different pruning criteria for probability trees and it is unclear which criteria are preferable under which circumstances. In this paper we survey six of the most important pruning criteria for probability trees. We discuss their theoretical advantages and disadvantages and we perform an extensive experimental study of their relative performance. The main conclusion is that a pruning criterion based on randomization tests usually performs best and learns trees that are relatively small. We identify several scenarios in which other pruning criteria achieve results comparable to those of randomization tests.status: publishe

Lirias

A comparison of pruning criteria for probability trees

Author: A. Assche Van
A. Bradley
B. Wang
B. Zadrozny
C. Ferri
C. Ferri
C. Ling
C. Wallace
D. Chickering
D. Grossman
D. Heckerman
D. Jensen
D. Jensen
Daan Fierens
E. Frank
F. Esposito
F. Provost
G. Schwarz
H. Blockeel
H. Blockeel
Hendrik Blockeel
J. Dougherty
J. Neville
J. Quinlan
J. Quinlan
Jan Ramon
K. Kersting
L. Getoor
Maurice Bruynooghe
N. Friedman
N. Friedman
R. Bouckaert
R. Caruana
S. Džeroski
S. Kramer
T. Fawcett
T. Hastie
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref