1,370 research outputs found
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Natural language is hierarchically structured: smaller units (e.g., phrases)
are nested within larger units (e.g., clauses). When a larger constituent ends,
all of the smaller constituents that are nested within it must also be closed.
While the standard LSTM architecture allows different neurons to track
information at different time scales, it does not have an explicit bias towards
modeling a hierarchy of constituents. This paper proposes to add such an
inductive bias by ordering the neurons; a vector of master input and forget
gates ensures that when a given neuron is updated, all the neurons that follow
it in the ordering are also updated. Our novel recurrent architecture, ordered
neurons LSTM (ON-LSTM), achieves good performance on four different tasks:
language modeling, unsupervised parsing, targeted syntactic evaluation, and
logical inference.Comment: Published as a conference paper at ICLR 201
On-the-fly Operation Batching in Dynamic Computation Graphs
Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer
more flexibility for implementing models that cope with data of varying
dimensions and structure, relative to toolkits that operate on statically
declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing
toolkits - both static and dynamic - require that the developer organize the
computations into the batches necessary for exploiting high-performance
algorithms and hardware. This batching task is generally difficult, but it
becomes a major hurdle as architectures become complex. In this paper, we
present an algorithm, and its implementation in the DyNet toolkit, for
automatically batching operations. Developers simply write minibatch
computations as aggregations of single instance computations, and the batching
algorithm seamlessly executes them, on the fly, using computationally efficient
batched operations. On a variety of tasks, we obtain throughput similar to that
obtained with manual batches, as well as comparable speedups over
single-instance learning on architectures that are impractical to batch
manually
Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin using Recursive Neural Networks
Logographs (Chinese characters) have recursive structures (i.e. hierarchies
of sub-units in logographs) that contain phonological and semantic information,
as developmental psychology literature suggests that native speakers leverage
on the structures to learn how to read. Exploiting these structures could
potentially lead to better embeddings that can benefit many downstream tasks.
We propose building hierarchical logograph (character) embeddings from
logograph recursive structures using treeLSTM, a recursive neural network.
Using recursive neural network imposes a prior on the mapping from logographs
to embeddings since the network must read in the sub-units in logographs
according to the order specified by the recursive structures. Based on human
behavior in language learning and reading, we hypothesize that modeling
logographs' structures using recursive neural network should be beneficial. To
verify this claim, we consider two tasks (1) predicting logographs' Cantonese
pronunciation from logographic structures and (2) language modeling. Empirical
results show that the proposed hierarchical embeddings outperform baseline
approaches. Diagnostic analysis suggests that hierarchical embeddings
constructed using treeLSTM is less sensitive to distractors, thus is more
robust, especially on complex logographs.Comment: Accepted by IEEE Transactions on Audio, Speech and Language
Processing. Copyright 2019 IEE
Ordered Memory
Stack-augmented recurrent neural networks (RNNs) have been of interest to the
deep learning community for some time. However, the difficulty of training
memory models remains a problem obstructing the widespread use of such models.
In this paper, we propose the Ordered Memory architecture. Inspired by Ordered
Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and
use its cumulative probability to control the writing and erasing operation of
the memory. We also introduce a new Gated Recursive Cell to compose lower-level
representations into higher-level representation. We demonstrate that our model
achieves strong performance on the logical inference task (Bowman et al.,
2015)and the ListOps (Nangia and Bowman, 2018) task. We can also interpret the
model to retrieve the induced tree structure, and find that these induced
structures align with the ground truth. Finally, we evaluate our model on the
Stanford SentimentTreebank tasks (Socher et al., 2013), and find that it
performs comparatively with the state-of-the-art methods in the literature.Comment: Published in NeurIPS 201
Learning to Segment Inputs for NMT Favors Character-Level Processing
Most modern neural machine translation (NMT) systems rely on presegmented
inputs. Segmentation granularity importantly determines the input and output
sequence lengths, hence the modeling depth, and source and target vocabularies,
which in turn determine model size, computational costs of softmax
normalization, and handling of out-of-vocabulary words. However, the current
practice is to use static, heuristic-based segmentations that are fixed before
NMT training. This begs the question whether the chosen segmentation is optimal
for the translation task. To overcome suboptimal segmentation choices, we
present an algorithm for dynamic segmentation based on the Adaptative
Computation Time algorithm (Graves 2016), that is trainable end-to-end and
driven by the NMT objective. In an evaluation on four translation tasks we
found that, given the freedom to navigate between different segmentation
levels, the model prefers to operate on (almost) character level, providing
support for purely character-level NMT models from a novel angle.Comment: Technical report for IWSLT 2018 pape
Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations
Syntax has been demonstrated highly effective in neural machine translation
(NMT). Previous NMT models integrate syntax by representing 1-best tree outputs
from a well-trained parsing system, e.g., the representative Tree-RNN and
Tree-Linearization methods, which may suffer from error propagation. In this
work, we propose a novel method to integrate source-side syntax implicitly for
NMT. The basic idea is to use the intermediate hidden representations of a
well-trained end-to-end dependency parser, which are referred to as
syntax-aware word representations (SAWRs). Then, we simply concatenate such
SAWRs with ordinary word embeddings to enhance basic NMT models. The method can
be straightforwardly integrated into the widely-used sequence-to-sequence
(Seq2Seq) NMT models. We start with a representative RNN-based Seq2Seq baseline
system, and test the effectiveness of our proposed method on two benchmark
datasets of the Chinese-English and English-Vietnamese translation tasks,
respectively. Experimental results show that the proposed approach is able to
bring significant BLEU score improvements on the two datasets compared with the
baseline, 1.74 points for Chinese-English translation and 0.80 point for
English-Vietnamese translation, respectively. In addition, the approach also
outperforms the explicit Tree-RNN and Tree-Linearization methods.Comment: NAACL 201
Contextualized Non-local Neural Networks for Sequence Learning
Recently, a large number of neural mechanisms and models have been proposed
for sequence learning, of which self-attention, as exemplified by the
Transformer model, and graph neural networks (GNNs) have attracted much
attention. In this paper, we propose an approach that combines and draws on the
complementary strengths of these two methods. Specifically, we propose
contextualized non-local neural networks (CN), which can both
dynamically construct a task-specific structure of a sentence and leverage rich
local dependencies within a particular neighborhood.
Experimental results on ten NLP tasks in text classification, semantic
matching, and sequence labeling show that our proposed model outperforms
competitive baselines and discovers task-specific dependency structures, thus
providing better interpretability to users.Comment: Accepted by AAAI201
Abstract Syntax Networks for Code Generation and Semantic Parsing
Tasks like code generation and semantic parsing require mapping unstructured
(or partially structured) inputs to well-formed, executable outputs. We
introduce abstract syntax networks, a modeling framework for these problems.
The outputs are represented as abstract syntax trees (ASTs) and constructed by
a decoder with a dynamically-determined modular structure paralleling the
structure of the output tree. On the benchmark Hearthstone dataset for code
generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy,
compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we
perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with
no task-specific engineering.Comment: ACL 2017. MR and MS contributed equall
Cavs: A Vertex-centric Programming Interface for Dynamic Neural Networks
Recent deep learning (DL) models have moved beyond static network
architectures to dynamic ones, handling data where the network structure
changes every example, such as sequences of variable lengths, trees, and
graphs. Existing dataflow-based programming models for DL---both static and
dynamic declaration---either cannot readily express these dynamic models, or
are inefficient due to repeated dataflow graph construction and processing, and
difficulties in batched execution. We present Cavs, a vertex-centric
programming interface and optimized system implementation for dynamic DL
models. Cavs represents dynamic network structure as a static vertex function
and a dynamic instance-specific graph , and performs
backpropagation by scheduling the execution of following the
dependencies in . Cavs bypasses expensive graph construction and
preprocessing overhead, allows for the use of static graph optimization
techniques on pre-defined operations in , and naturally exposes
batched execution opportunities over different graphs. Experiments comparing
Cavs to two state-of-the-art frameworks for dynamic NNs (TensorFlow Fold and
DyNet) demonstrate the efficacy of this approach: Cavs achieves a near one
order of magnitude speedup on training of various dynamic NN architectures, and
ablations demonstrate the contribution of our proposed batching and memory
management strategies.Comment: Short versions of this paper were presented at AISys workshop@SOSP
2017 and MLSys workshop@NIPS 201
Event Representations with Tensor-based Compositions
Robust and flexible event representations are important to many core areas in
language understanding. Scripts were proposed early on as a way of representing
sequences of events for such understanding, and has recently attracted renewed
attention. However, obtaining effective representations for modeling
script-like event sequences is challenging. It requires representations that
can capture event-level and scenario-level semantics. We propose a new
tensor-based composition method for creating event representations. The method
captures more subtle semantic interactions between an event and its entities
and yields representations that are effective at multiple event-related tasks.
With the continuous representations, we also devise a simple schema generation
method which produces better schemas compared to a prior discrete
representation based method. Our analysis shows that the tensors capture
distinct usages of a predicate even when there are only subtle differences in
their surface realizations.Comment: Accepted at AAAI 201
- …