Search CORE

19 research outputs found

Tree Transformer: Integrating Tree Structures into Self-Attention

Author: Chen Yun-Nung
Lee Hung-Yi
Wang Yau-Shian
Publication venue
Publication date: 01/01/2019
Field of study

Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores.Comment: accepted by EMNLP 201

arXiv.org e-Print Archive

Crossref

Learning binary trees by argmin differentiation

Author: Kusner M.J.
Niculae V.
Zantedeschi V.
Publication venue
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Learning Binary Decision Trees by Argmin Differentiation

Author: Kusner Matt J
Niculae Vlad
Zantedeschi Valentina
Publication venue: PMLR
Publication date: 24/07/2021
Field of study

We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters, to allow gradients to pass through the program to continuous parameters. We derive customized algorithms to efficiently compute the forward and backward passes. This means that our tree learning procedure can be used as an (implicit) layer in arbitrary deep networks, and can be optimized with arbitrary loss functions. We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings. Further, apart from greedy approaches (which do not have competitive accuracies), our method is faster to train than all other tree-learning baselines we compare with. The code for reproducing the results is available at https://github.com/vzantedeschi/LatentTrees

UCL Discovery

Montague Grammar Induction

Author: Kim Gene Louis
White Aaron Steven
Publication venue: 'Linguistic Society of America'
Publication date: 02/03/2021
Field of study

We propose a computational model for inducing full-fledged combinatory categorial grammars from behavioral data. This model contrasts with prior computational models of selection in representing syntactic and semantic types as structured (rather than atomic) objects, enabling direct interpretation of the modeling results relative to standard formal frameworks. We investigate the grammar our model induces when fit to a lexicon-scale acceptability judgment dataset – Mega Acceptability – focusing in particular on the types our model assigns to clausal complements and the predicates that select them

Proceedings Published by the LSA (Linguistic Society of America)

Zero-Shot 3D Drug Design by Sketching and Generating

Author: Dai Xinyu
Long Siyu
Zhou Hao
Zhou Yi
Publication venue
Publication date: 04/10/2022
Field of study

Drug design is a crucial step in the drug discovery cycle. Recently, various deep learning-based methods design drugs by generating novel molecules from scratch, avoiding traversing large-scale drug libraries. However, they depend on scarce experimental data or time-consuming docking simulation, leading to overfitting issues with limited training data and slow generation speed. In this study, we propose the zero-shot drug design method DESERT (Drug dEsign by SkEtching and geneRaTing). Specifically, DESERT splits the design process into two stages: sketching and generating, and bridges them with the molecular shape. The two-stage fashion enables our method to utilize the large-scale molecular database to reduce the need for experimental data and docking simulation. Experiments show that DESERT achieves a new state-of-the-art at a fast speed.Comment: NeurIPS 2022 camera-read

arXiv.org e-Print Archive

Learning Binary Decision Trees by Argmin Differentiation

Author: Kusner Matt J.
Niculae Vlad
Zantedeschi Valentina
Publication venue
Publication date: 07/06/2021
Field of study

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

UCL Discovery