19 research outputs found
Tree Transformer: Integrating Tree Structures into Self-Attention
Pre-training Transformer from large-scale raw texts and fine-tuning on the
desired task have achieved state-of-the-art results on diverse NLP tasks.
However, it is unclear what the learned attention captures. The attention
computed by attention heads seems not to match human intuitions about
hierarchical structures. This paper proposes Tree Transformer, which adds an
extra constraint to attention heads of the bidirectional Transformer encoder in
order to encourage the attention heads to follow tree structures. The tree
structures can be automatically induced from raw texts by our proposed
"Constituent Attention" module, which is simply implemented by self-attention
between two adjacent words. With the same training procedure identical to BERT,
the experiments demonstrate the effectiveness of Tree Transformer in terms of
inducing tree structures, better language modeling, and further learning more
explainable attention scores.Comment: accepted by EMNLP 201
Learning Binary Decision Trees by Argmin Differentiation
We address the problem of learning binary decision trees that partition data
for some downstream task. We propose to learn discrete parameters (i.e., for
tree traversals and node pruning) and continuous parameters (i.e., for tree
split functions and prediction functions) simultaneously using argmin
differentiation. We do so by sparsely relaxing a mixed-integer program for the
discrete parameters, to allow gradients to pass through the program to
continuous parameters. We derive customized algorithms to efficiently compute
the forward and backward passes. This means that our tree learning procedure
can be used as an (implicit) layer in arbitrary deep networks, and can be
optimized with arbitrary loss functions. We demonstrate that our approach
produces binary trees that are competitive with existing single tree and
ensemble approaches, in both supervised and unsupervised settings. Further,
apart from greedy approaches (which do not have competitive accuracies), our
method is faster to train than all other tree-learning baselines we compare
with. The code for reproducing the results is available at
https://github.com/vzantedeschi/LatentTrees
Montague Grammar Induction
We propose a computational model for inducing full-fledged combinatory categorial grammars from behavioral data. This model contrasts with prior computational models of selection in representing syntactic and semantic types as structured (rather than atomic) objects, enabling direct interpretation of the modeling results relative to standard formal frameworks. We investigate the grammar our model induces when fit to a lexicon-scale acceptability judgment dataset – Mega Acceptability – focusing in particular on the types our model assigns to clausal complements and the predicates that select them
Zero-Shot 3D Drug Design by Sketching and Generating
Drug design is a crucial step in the drug discovery cycle. Recently, various
deep learning-based methods design drugs by generating novel molecules from
scratch, avoiding traversing large-scale drug libraries. However, they depend
on scarce experimental data or time-consuming docking simulation, leading to
overfitting issues with limited training data and slow generation speed. In
this study, we propose the zero-shot drug design method DESERT (Drug dEsign by
SkEtching and geneRaTing). Specifically, DESERT splits the design process into
two stages: sketching and generating, and bridges them with the molecular
shape. The two-stage fashion enables our method to utilize the large-scale
molecular database to reduce the need for experimental data and docking
simulation. Experiments show that DESERT achieves a new state-of-the-art at a
fast speed.Comment: NeurIPS 2022 camera-read
Learning Binary Decision Trees by Argmin Differentiation
We address the problem of learning binary decision trees that partition data
for some downstream task. We propose to learn discrete parameters (i.e., for
tree traversals and node pruning) and continuous parameters (i.e., for tree
split functions and prediction functions) simultaneously using argmin
differentiation. We do so by sparsely relaxing a mixed-integer program for the
discrete parameters, to allow gradients to pass through the program to
continuous parameters. We derive customized algorithms to efficiently compute
the forward and backward passes. This means that our tree learning procedure
can be used as an (implicit) layer in arbitrary deep networks, and can be
optimized with arbitrary loss functions. We demonstrate that our approach
produces binary trees that are competitive with existing single tree and
ensemble approaches, in both supervised and unsupervised settings. Further,
apart from greedy approaches (which do not have competitive accuracies), our
method is faster to train than all other tree-learning baselines we compare
with. The code for reproducing the results is available at
https://github.com/vzantedeschi/LatentTrees