769 research outputs found
TreeGrad: Transferring Tree Ensembles to Neural Networks
Gradient Boosting Decision Tree (GBDT) are popular machine learning
algorithms with implementations such as LightGBM and in popular machine
learning toolkits like Scikit-Learn. Many implementations can only produce
trees in an offline manner and in a greedy manner. We explore ways to convert
existing GBDT implementations to known neural network architectures with
minimal performance loss in order to allow decision splits to be updated in an
online manner and provide extensions to allow splits points to be altered as a
neural architecture search problem. We provide learning bounds for our neural
network.Comment: Technical Report on Implementation of Deep Neural Decision Forests
Algorithm. To accompany implementation here:
https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019).
"Transferring Tree Ensembles to Neural Networks". International Conference on
Neural Information Processing. Springer, 2019. arXiv admin note: text overlap
with arXiv:1909.1179
Weka: A machine learning workbench for data mining
The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, convenient interactive graphical user interfaces are provided for data exploration, for setting up large-scale experiments on distributed computing platforms, and for designing configurations for streamed data processing. These interfaces constitute an advanced environment for experimental data mining. The system is written in Java and distributed under the terms of the GNU General Public License
Incremental Training of a Detector Using Online Sparse Eigen-decomposition
The ability to efficiently and accurately detect objects plays a very crucial
role for many computer vision tasks. Recently, offline object detectors have
shown a tremendous success. However, one major drawback of offline techniques
is that a complete set of training data has to be collected beforehand. In
addition, once learned, an offline detector can not make use of newly arriving
data. To alleviate these drawbacks, online learning has been adopted with the
following objectives: (1) the technique should be computationally and storage
efficient; (2) the updated classifier must maintain its high classification
accuracy. In this paper, we propose an effective and efficient framework for
learning an adaptive online greedy sparse linear discriminant analysis (GSLDA)
model. Unlike many existing online boosting detectors, which usually apply
exponential or logistic loss, our online algorithm makes use of LDA's learning
criterion that not only aims to maximize the class-separation criterion but
also incorporates the asymmetrical property of training data distributions. We
provide a better alternative for online boosting algorithms in the context of
training a visual object detector. We demonstrate the robustness and efficiency
of our methods on handwriting digit and face data sets. Our results confirm
that object detection tasks benefit significantly when trained in an online
manner.Comment: 14 page
Boosting the Anatomy of Volatility
Risk and, thus, the volatility of financial asset prices plays a major role in financial decision making and financial regulation. Therefore, understanding and predicting the volatility of financial instruments, asset classes or financial markets in general is of utmost importance for individual and institutional investors as well as for central bankers and financial regulators.
In this paper we investigate new strategies for understanding and predicting financial risk. Specifically, we use componentwise, gradient boosting techniques to identify factors that drive financial-market risk and to assess the specific nature with which these factors affect future volatility. Componentwise boosting is a sequential learning method, which has the advantages that it can handle a large number of predictors and that it-in contrast to other machine-learning techniques-preserves interpretation.
Adopting an EGARCH framework and employing a wide range of potential risk drivers, we derive monthly volatility predictions for stock, bond, commodity, and foreign exchange markets. Comparisons with alternative benchmark models show that boosting techniques improve out-of-sample volatility forecasts, especially for medium- and long-run horizons. Another finding is that a number of risk drivers affect volatility in a nonlinear fashion
Boosting as a Product of Experts
In this paper, we derive a novel probabilistic model of boosting as a Product
of Experts. We re-derive the boosting algorithm as a greedy incremental model
selection procedure which ensures that addition of new experts to the ensemble
does not decrease the likelihood of the data. These learning rules lead to a
generic boosting algorithm - POE- Boost which turns out to be similar to the
AdaBoost algorithm under certain assumptions on the expert probabilities. The
paper then extends the POEBoost algorithm to POEBoost.CS which handles
hypothesis that produce probabilistic predictions. This new algorithm is shown
to have better generalization performance compared to other state of the art
algorithms
- …