4 research outputs found
Joint learning of interpretation and distillation
The extra trust brought by the model interpretation has made it an
indispensable part of machine learning systems. But to explain a distilled
model's prediction, one may either work with the student model itself, or turn
to its teacher model. This leads to a more fundamental question: if a distilled
model should give a similar prediction for a similar reason as its teacher
model on the same input? This question becomes even more crucial when the two
models have dramatically different structure, taking GBDT2NN for example. This
paper conducts an empirical study on the new approach to explaining each
prediction of GBDT2NN, and how imitating the explanation can further improve
the distillation process as an auxiliary learning task. Experiments on several
benchmarks show that the proposed methods achieve better performance on both
explanations and predictions
Dynamic Connected Neural Decision Classifier and Regressor with Dynamic Softing Pruning
To deal with various datasets over different complexity, this paper presents
an self-adaptive learning model that combines the proposed Dynamic Connected
Neural Decision Networks (DNDN) and a new pruning method--Dynamic Soft Pruning
(DSP). DNDN is a combination of random forests and deep neural networks that
enjoys both the advantages of strong classification capability of tree-like
structure and representation learning capability of network structure. Based on
Deep Neural Decision Forests (DNDF), this paper adopts an end-to-end training
approach by representing the classification distribution with multiple randomly
initialized softmax layers, which further allows an ensemble of multiple random
forests attached to layers of neural network with different depth. We also
propose a soft pruning method DSP to reduce the redundant connections of the
network adaptively to avoid over-fitting simple dataset. The model demonstrates
no performance loss compared with unpruned models and even higher robustness
over different data and feature distribution. Extensive experiments on
different datasets demonstrate the superiority of the proposed model over other
popular algorithms in solving classification tasks
Deep Forest
Current deep learning models are mostly build upon neural networks, i.e.,
multiple layers of parameterized differentiable nonlinear modules that can be
trained by backpropagation. In this paper, we explore the possibility of
building deep models based on non-differentiable modules. We conjecture that
the mystery behind the success of deep neural networks owes much to three
characteristics, i.e., layer-by-layer processing, in-model feature
transformation and sufficient model complexity. We propose the gcForest
approach, which generates \textit{deep forest} holding these characteristics.
This is a decision tree ensemble approach, with much less hyper-parameters than
deep neural networks, and its model complexity can be automatically determined
in a data-dependent way. Experiments show that its performance is quite robust
to hyper-parameter settings, such that in most cases, even across different
data from different domains, it is able to get excellent performance by using
the same default setting. This study opens the door of deep learning based on
non-differentiable modules, and exhibits the possibility of constructing deep
models without using backpropagation
Neural Random Forests
Given an ensemble of randomized regression trees, it is possible to
restructure them as a collection of multilayered neural networks with
particular connection weights. Following this principle, we reformulate the
random forest method of Breiman (2001) into a neural network setting, and in
turn propose two new hybrid procedures that we call neural random forests. Both
predictors exploit prior knowledge of regression trees for their architecture,
have less parameters to tune than standard networks, and less restrictions on
the geometry of the decision boundaries than trees. Consistency results are
proved, and substantial numerical evidence is provided on both synthetic and
real data sets to assess the excellent performance of our methods in a large
variety of prediction problems