3 research outputs found
Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming
We treat projective dependency trees as latent variables in our probabilistic
model and induce them in such a way as to be beneficial for a downstream task,
without relying on any direct tree supervision. Our approach relies on Gumbel
perturbations and differentiable dynamic programming. Unlike previous
approaches to latent tree learning, we stochastically sample global structures
and our parser is fully differentiable. We illustrate its effectiveness on
sentiment analysis and natural language inference tasks. We also study its
properties on a synthetic structure induction task. Ablation studies emphasize
the importance of both stochasticity and constraining latent structures to be
projective trees.Comment: Accepted at ACL 201
Learning Binary Decision Trees by Argmin Differentiation
We address the problem of learning binary decision trees that partition data
for some downstream task. We propose to learn discrete parameters (i.e., for
tree traversals and node pruning) and continuous parameters (i.e., for tree
split functions and prediction functions) simultaneously using argmin
differentiation. We do so by sparsely relaxing a mixed-integer program for the
discrete parameters, to allow gradients to pass through the program to
continuous parameters. We derive customized algorithms to efficiently compute
the forward and backward passes. This means that our tree learning procedure
can be used as an (implicit) layer in arbitrary deep networks, and can be
optimized with arbitrary loss functions. We demonstrate that our approach
produces binary trees that are competitive with existing single tree and
ensemble approaches, in both supervised and unsupervised settings. Further,
apart from greedy approaches (which do not have competitive accuracies), our
method is faster to train than all other tree-learning baselines we compare
with. The code for reproducing the results is available at
https://github.com/vzantedeschi/LatentTrees
Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming
We treat projective dependency trees as latent variables in our probabilistic model and induce them in such a way as to be beneficial for a downstream task, without relying on any direct tree supervision. Our approach relies on Gumbel perturbations and differentiable dynamic programming. Unlike previous approaches to latent tree learning, we stochastically sample global structures and our parser is fully differentiable. We illustrate its effectiveness on sentiment analysis and natural language inference tasks. We also study its properties on a synthetic structure induction task. Ablation studies emphasize the importance of both stochasticity and constraining latent structures to be projective trees