13,957 research outputs found
Generating Compact Tree Ensembles via Annealing
Tree ensembles are flexible predictive models that can capture relevant
variables and to some extent their interactions in a compact and interpretable
manner. Most algorithms for obtaining tree ensembles are based on versions of
boosting or Random Forest. Previous work showed that boosting algorithms
exhibit a cyclic behavior of selecting the same tree again and again due to the
way the loss is optimized. At the same time, Random Forest is not based on loss
optimization and obtains a more complex and less interpretable model. In this
paper we present a novel method for obtaining compact tree ensembles by growing
a large pool of trees in parallel with many independent boosting threads and
then selecting a small subset and updating their leaf weights by loss
optimization. We allow for the trees in the initial pool to have different
depths which further helps with generalization. Experiments on real datasets
show that the obtained model has usually a smaller loss than boosting, which is
also reflected in a lower misclassification error on the test set.Comment: Comparison with Random Forest included in the results sectio
Ensemble learning of linear perceptron; Online learning theory
Within the framework of on-line learning, we study the generalization error
of an ensemble learning machine learning from a linear teacher perceptron. The
generalization error achieved by an ensemble of linear perceptrons having
homogeneous or inhomogeneous initial weight vectors is precisely calculated at
the thermodynamic limit of a large number of input elements and shows rich
behavior. Our main findings are as follows. For learning with homogeneous
initial weight vectors, the generalization error using an infinite number of
linear student perceptrons is equal to only half that of a single linear
perceptron, and converges with that of the infinite case with O(1/K) for a
finite number of K linear perceptrons. For learning with inhomogeneous initial
weight vectors, it is advantageous to use an approach of weighted averaging
over the output of the linear perceptrons, and we show the conditions under
which the optimal weights are constant during the learning process. The optimal
weights depend on only correlation of the initial weight vectors.Comment: 14 pages, 3 figures, submitted to Physical Review
Neural network ensembles: Evaluation of aggregation algorithms
Ensembles of artificial neural networks show improved generalization
capabilities that outperform those of single networks. However, for aggregation
to be effective, the individual networks must be as accurate and diverse as
possible. An important problem is, then, how to tune the aggregate members in
order to have an optimal compromise between these two conflicting conditions.
We present here an extensive evaluation of several algorithms for ensemble
construction, including new proposals and comparing them with standard methods
in the literature. We also discuss a potential problem with sequential
aggregation algorithms: the non-frequent but damaging selection through their
heuristics of particularly bad ensemble members. We introduce modified
algorithms that cope with this problem by allowing individual weighting of
aggregate members. Our algorithms and their weighted modifications are
favorably tested against other methods in the literature, producing a sensible
improvement in performance on most of the standard statistical databases used
as benchmarks.Comment: 35 pages, 2 figures, In press AI Journa
Perceptron learning with random coordinate descent
A perceptron is a linear threshold classifier that separates examples with a hyperplane. It is perhaps the simplest learning model that is used standalone. In this paper, we propose a family of random coordinate descent algorithms for perceptron learning on binary classification problems. Unlike most perceptron learning algorithms which require smooth cost functions, our algorithms directly minimize the training error, and usually achieve the lowest training error compared with other algorithms. The algorithms are also computational efficient. Such advantages make them favorable for both standalone use and ensemble learning, on problems that are not linearly separable. Experiments show that our algorithms work very well with AdaBoost, and achieve the lowest test errors for half of the datasets
Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent
The 0/1 loss is an important cost function for perceptrons. Nevertheless it cannot be easily minimized by most existing perceptron learning algorithms. In this paper, we propose a family of random coordinate descent algorithms to directly minimize the 0/1 loss for perceptrons, and prove their convergence. Our algorithms are computationally efficient, and usually achieve the lowest 0/1 loss compared with other algorithms. Such advantages make them favorable for nonseparable real-world problems. Experiments show that our algorithms are especially useful for ensemble learning, and could achieve the lowest test error for many complex data sets when coupled with AdaBoost
Totally Corrective Multiclass Boosting with Binary Weak Learners
In this work, we propose a new optimization framework for multiclass boosting
learning. In the literature, AdaBoost.MO and AdaBoost.ECC are the two
successful multiclass boosting algorithms, which can use binary weak learners.
We explicitly derive these two algorithms' Lagrange dual problems based on
their regularized loss functions. We show that the Lagrange dual formulations
enable us to design totally-corrective multiclass algorithms by using the
primal-dual optimization technique. Experiments on benchmark data sets suggest
that our multiclass boosting can achieve a comparable generalization capability
with state-of-the-art, but the convergence speed is much faster than stage-wise
gradient descent boosting. In other words, the new totally corrective
algorithms can maximize the margin more aggressively.Comment: 11 page
- …