Search CORE

35,954 research outputs found

Instance and feature weighted k-nearest-neighbors algorithm

Author: Belanche Muñoz Luis Antonio
Prat Gabriel
Publication venue: I6doc.com
Publication date: 01/01/2016
Field of study

We present a novel method that aims at providing a more stable selection of feature subsets when variations in the training process occur. This is accomplished by using an instance-weighting process -assigning different importances to instances as a preprocessing step to a feature weighting method that is independent of the learner, and then making good use of both sets of computed weigths in a standard Nearest-Neighbours classifier. We report extensive experimentation in well-known benchmarking datasets as well as some challenging microarray gene expression problems. Our results show increases in stability for most subset sizes and most problems, without compromising prediction accuracy.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Generating Compact Tree Ensembles via Annealing

Author: Barbu Adrian
Dawer Gitesh
Guo Yangzi
Publication venue
Publication date: 19/02/2020
Field of study

Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more complex and less interpretable model. In this paper we present a novel method for obtaining compact tree ensembles by growing a large pool of trees in parallel with many independent boosting threads and then selecting a small subset and updating their leaf weights by loss optimization. We allow for the trees in the initial pool to have different depths which further helps with generalization. Experiments on real datasets show that the obtained model has usually a smaller loss than boosting, which is also reflected in a lower misclassification error on the test set.Comment: Comparison with Random Forest included in the results sectio

arXiv.org e-Print Archive

Crossref