Search CORE

412 research outputs found

Sparse tree-based initialization for neural networks

Author: Arnould Ludovic
Boyer Claire
Lutz Patrick
Scornet Erwan
Publication venue
Publication date: 30/09/2022
Field of study

Dedicated neural network (NN) architectures have been designed to handle specific data types (such as CNN for images or RNN for text), which ranks them among state-of-the-art methods for dealing with these data. Unfortunately, no architecture has been found for dealing with tabular data yet, for which tree ensemble methods (tree boosting, random forests) usually show the best predictive performances. In this work, we propose a new sparse initialization technique for (potentially deep) multilayer perceptrons (MLP): we first train a tree-based procedure to detect feature interactions and use the resulting information to initialize the network, which is subsequently trained via standard stochastic gradient strategies. Numerical experiments on several tabular data sets show that this new, simple and easy-to-use method is a solid concurrent, both in terms of generalization capacity and computation time, to default MLP initialization and even to existing complex deep learning solutions. In fact, this wise MLP initialization raises the resulting NN methods to the level of a valid competitor to gradient boosting when dealing with tabular data. Besides, such initializations are able to preserve the sparsity of weights introduced in the first layers of the network through training. This fact suggests that this new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor (as for convolutional layers in CNN)

arXiv.org e-Print Archive

Learning Binary Decision Trees by Argmin Differentiation

Author: Kusner Matt J.
Niculae Vlad
Zantedeschi Valentina
Publication venue
Publication date: 07/06/2021
Field of study

We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass through the program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This means that our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with. The code for reproducing the results is available at https://github.com/vzantedeschi/LatentTrees

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

UCL Discovery

Learning binary trees by argmin differentiation

Author: Kusner M.J.
Niculae V.
Zantedeschi V.
Publication venue
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Learning Binary Decision Trees by Argmin Differentiation

Author: Kusner Matt J
Niculae Vlad
Zantedeschi Valentina
Publication venue: PMLR
Publication date: 24/07/2021
Field of study

UCL Discovery