956 research outputs found
A Deep Network with Visual Text Composition Behavior
While natural languages are compositional, how state-of-the-art neural models
achieve compositionality is still unclear. We propose a deep network, which not
only achieves competitive accuracy for text classification, but also exhibits
compositional behavior. That is, while creating hierarchical representations of
a piece of text, such as a sentence, the lower layers of the network distribute
their layer-specific attention weights to individual words. In contrast, the
higher layers compose meaningful phrases and clauses, whose lengths increase as
the networks get deeper until fully composing the sentence.Comment: accepted to ACL201
End-to-End Multi-View Networks for Text Classification
We propose a multi-view network for text classification. Our method
automatically creates various views of its input text, each taking the form of
soft attention weights that distribute the classifier's focus among a set of
base features. For a bag-of-words representation, each view focuses on a
different subset of the text's words. Aggregating many such views results in a
more discriminative and robust representation. Through a novel architecture
that both stacks and concatenates views, we produce a network that emphasizes
both depth and width, allowing training to converge quickly. Using our
multi-view architecture, we establish new state-of-the-art accuracies on two
benchmark tasks.Comment: 6 page
MixUp as Locally Linear Out-Of-Manifold Regularization
MixUp is a recently proposed data-augmentation scheme, which linearly
interpolates a random pair of training examples and correspondingly the one-hot
representations of their labels. Training deep neural networks with such
additional data is shown capable of significantly improving the predictive
accuracy of the current art. The power of MixUp, however, is primarily
established empirically and its working and effectiveness have not been
explained in any depth. In this paper, we develop an understanding for MixUp as
a form of "out-of-manifold regularization", which imposes certain "local
linearity" constraints on the model's input space beyond the data manifold.
This analysis enables us to identify a limitation of MixUp, which we call
"manifold intrusion". In a nutshell, manifold intrusion in MixUp is a form of
under-fitting resulting from conflicts between the synthetic labels of the
mixed-up examples and the labels of original training data. Such a phenomenon
usually happens when the parameters controlling the generation of mixing
policies are not sufficiently fine-tuned on the training data. To address this
issue, we propose a novel adaptive version of MixUp, where the mixing policies
are automatically learned from the data using an additional network and
objective function designed to avoid manifold intrusion. The proposed
regularizer, AdaMixUp, is empirically evaluated on several benchmark datasets.
Extensive experiments demonstrate that AdaMixUp improves upon MixUp when
applied to the current art of deep classification models.Comment: Accepted by AAAI201
- …