1,288 research outputs found
Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of Regularization
While regularization is widely used in training gradient boosted
trees, popular individualized feature attribution methods for trees such as
Saabas and TreeSHAP overlook the training procedure. We propose Prediction
Decomposition Attribution (PreDecomp), a novel individualized feature
attribution for gradient boosted trees when they are trained with
regularization. Theoretical analysis shows that the inner product between
PreDecomp and labels on in-sample data is essentially the total gain of a tree,
and that it can faithfully recover additive models in the population case when
features are independent. Inspired by the connection between PreDecomp and
total gain, we also propose TreeInner, a family of debiased global feature
attributions defined in terms of the inner product between any individualized
feature attribution and labels on out-sample data for each tree. Numerical
experiments on a simulated dataset and a genomic ChIP dataset show that
TreeInner has state-of-the-art feature selection performance. Code reproducing
experiments is available at https://github.com/nalzok/TreeInner .Comment: 43 pages, 29 figure
On marginal feature attributions of tree-based models
Due to their power and ease of use, tree-based machine learning models have
become very popular. To interpret these models, local feature attributions
based on marginal expectations e.g. marginal (interventional) Shapley, Owen or
Banzhaf values may be employed. Such feature attribution methods are true to
the model and implementation invariant, i.e. dependent only on the input-output
function of the model. By taking advantage of the internal structure of
tree-based models, we prove that their marginal Shapley values, or more
generally marginal feature attributions obtained from a linear game value, are
simple (piecewise-constant) functions with respect to a certain finite
partition of the input space determined by the trained model. The same is true
for feature attributions obtained from the famous TreeSHAP algorithm.
Nevertheless, we show that the "path-dependent" TreeSHAP is not implementation
invariant by presenting two (statistically similar) decision trees computing
the exact same function for which the algorithm yields different rankings of
features, whereas the marginal Shapley values coincide. Furthermore, we discuss
how the fact that marginal feature attributions are simple functions can
potentially be utilized to compute them. An important observation, showcased by
experiments with XGBoost, LightGBM and CatBoost libraries, is that only a
portion of all features appears in a tree from the ensemble; thus the
complexity of computing marginal Shapley (or Owen or Banzhaf) feature
attributions may be reduced. In particular, in the case of CatBoost models, the
trees are oblivious (symmetric) and the number of features in each of them is
no larger than the depth. We exploit the symmetry to derive an explicit formula
with improved complexity for marginal Shapley (and Banzhaf and Owen) values
which is only in terms of the internal parameters of the CatBoost model.Comment: 48 pages, 7 figure
- β¦