5 research outputs found
Recommended from our members
Linear Time Nonparametric Classification and Feature Selection with Polynomial MPMC Cascades for Large Datasets ; CU-CS-977-04
Quickly Boosting Decision Trees - Pruning Underachieving Features Early
Boosted decision trees are one of the most popular and successful learning techniques used today. While exhibiting fast speeds at test time, relatively slow training makes them impractical for applications with real-time learning requirements. We propose a principled approach to overcome this drawback. We prove a bound on the error of a decision stump given its preliminary error on a subset of the training data; the bound may be used to prune unpromising features early on in the training process. We propose a fast training algorithm that exploits this bound, yielding speedups of an order of magnitude at no cost in the final performance of the classifier. Our method is not a new variant of Boosting; rather, it may be used in conjunction with existing Boosting algorithms and other sampling heuristics to achieve even greater speedups
Faster Boosting with Smaller Memory
State-of-the-art implementations of boosting, such as XGBoost and LightGBM,
can process large training sets extremely fast. However, this performance
requires that the memory size is sufficient to hold a 2-3 multiple of the
training set size. This paper presents an alternative approach to implementing
the boosted trees, which achieves a significant speedup over XGBoost and
LightGBM, especially when the memory size is small. This is achieved using a
combination of three techniques: early stopping, effective sample size, and
stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost
when the training data is too large to fit in memory.Comment: NeurIPS 201