2,453 research outputs found
Recursive Neural Language Architecture for Tag Prediction
We consider the problem of learning distributed representations for tags from
their associated content for the task of tag recommendation. Considering
tagging information is usually very sparse, effective learning from content and
tag association is very crucial and challenging task. Recently, various neural
representation learning models such as WSABIE and its variants show promising
performance, mainly due to compact feature representations learned in a
semantic space. However, their capacity is limited by a linear compositional
approach for representing tags as sum of equal parts and hurt their
performance. In this work, we propose a neural feedback relevance model for
learning tag representations with weighted feature representations. Our
experiments on two widely used datasets show significant improvement for
quality of recommendations over various baselines
Learning Distributed Representations of Sentences from Unlabelled Data
Unsupervised methods for learning distributed representations of words are
ubiquitous in today's NLP research, but far less is known about the best ways
to learn distributed phrase or sentence representations from unlabelled data.
This paper is a systematic comparison of models that learn such
representations. We find that the optimal approach depends critically on the
intended application. Deeper, more complex models are preferable for
representations to be used in supervised systems, but shallow log-linear models
work best for building representation spaces that can be decoded with simple
spatial distance metrics. We also propose two new unsupervised
representation-learning objectives designed to optimise the trade-off between
training time, domain portability and performance
Unsupervised Visual Representation Learning by Context Prediction
This work explores the use of spatial context as a source of free and
plentiful supervisory signal for training a rich visual representation. Given
only a large, unlabeled image collection, we extract random pairs of patches
from each image and train a convolutional neural net to predict the position of
the second patch relative to the first. We argue that doing well on this task
requires the model to learn to recognize objects and their parts. We
demonstrate that the feature representation learned using this within-image
context indeed captures visual similarity across images. For example, this
representation allows us to perform unsupervised visual discovery of objects
like cats, people, and even birds from the Pascal VOC 2011 detection dataset.
Furthermore, we show that the learned ConvNet can be used in the R-CNN
framework and provides a significant boost over a randomly-initialized ConvNet,
resulting in state-of-the-art performance among algorithms which use only
Pascal-provided training set annotations.Comment: Oral paper at ICCV 201
Joint auto-encoders: a flexible multi-task learning framework
The incorporation of prior knowledge into learning is essential in achieving
good performance based on small noisy samples. Such knowledge is often
incorporated through the availability of related data arising from domains and
tasks similar to the one of current interest. Ideally one would like to allow
both the data for the current task and for previous related tasks to
self-organize the learning system in such a way that commonalities and
differences between the tasks are learned in a data-driven fashion. We develop
a framework for learning multiple tasks simultaneously, based on sharing
features that are common to all tasks, achieved through the use of a modular
deep feedforward neural network consisting of shared branches, dealing with the
common features of all tasks, and private branches, learning the specific
unique aspects of each task. Once an appropriate weight sharing architecture
has been established, learning takes place through standard algorithms for
feedforward networks, e.g., stochastic gradient descent and its variations. The
method deals with domain adaptation and multi-task learning in a unified
fashion, and can easily deal with data arising from different types of sources.
Numerical experiments demonstrate the effectiveness of learning in domain
adaptation and transfer learning setups, and provide evidence for the flexible
and task-oriented representations arising in the network
Cross-topic distributional semantic representations via unsupervised mappings
In traditional Distributional Semantic Models (DSMs) the multiple senses of a
polysemous word are conflated into a single vector space representation. In
this work, we propose a DSM that learns multiple distributional representations
of a word based on different topics. First, a separate DSM is trained for each
topic and then each of the topic-based DSMs is aligned to a common vector
space. Our unsupervised mapping approach is motivated by the hypothesis that
words preserving their relative distances in different topic semantic
sub-spaces constitute robust \textit{semantic anchors} that define the mappings
between them. Aligned cross-topic representations achieve state-of-the-art
results for the task of contextual word similarity. Furthermore, evaluation on
NLP downstream tasks shows that multiple topic-based embeddings outperform
single-prototype models.Comment: NAACL-HLT 201
Event Representations with Tensor-based Compositions
Robust and flexible event representations are important to many core areas in
language understanding. Scripts were proposed early on as a way of representing
sequences of events for such understanding, and has recently attracted renewed
attention. However, obtaining effective representations for modeling
script-like event sequences is challenging. It requires representations that
can capture event-level and scenario-level semantics. We propose a new
tensor-based composition method for creating event representations. The method
captures more subtle semantic interactions between an event and its entities
and yields representations that are effective at multiple event-related tasks.
With the continuous representations, we also devise a simple schema generation
method which produces better schemas compared to a prior discrete
representation based method. Our analysis shows that the tensors capture
distinct usages of a predicate even when there are only subtle differences in
their surface realizations.Comment: Accepted at AAAI 201
Natural Language Inference by Tree-Based Convolution and Heuristic Matching
In this paper, we propose the TBCNN-pair model to recognize entailment and
contradiction between two sentences. In our model, a tree-based convolutional
neural network (TBCNN) captures sentence-level semantics; then heuristic
matching layers like concatenation, element-wise product/difference combine the
information in individual sentences. Experimental results show that our model
outperforms existing sentence encoding-based approaches by a large margin.Comment: Accepted by ACL'16 as a short pape
Spherical Latent Spaces for Stable Variational Autoencoders
A hallmark of variational autoencoders (VAEs) for text processing is their
combination of powerful encoder-decoder models, such as LSTMs, with simple
latent distributions, typically multivariate Gaussians. These models pose a
difficult optimization problem: there is an especially bad local optimum where
the variational posterior always equals the prior and the model does not use
the latent variable at all, a kind of "collapse" which is encouraged by the KL
divergence term of the objective. In this work, we experiment with another
choice of latent distribution, namely the von Mises-Fisher (vMF) distribution,
which places mass on the surface of the unit hypersphere. With this choice of
prior and posterior, the KL divergence term now only depends on the variance of
the vMF distribution, giving us the ability to treat it as a fixed
hyperparameter. We show that doing so not only averts the KL collapse, but
consistently gives better likelihoods than Gaussians across a range of modeling
conditions, including recurrent language modeling and bag-of-words document
modeling. An analysis of the properties of our vMF representations shows that
they learn richer and more nuanced structures in their latent representations
than their Gaussian counterparts.Comment: To appear in EMNLP 2018; 11 pages; Code release:
https://github.com/jiacheng-xu/vmf_vae_nl
Modelling Interaction of Sentence Pair with coupled-LSTMs
Recently, there is rising interest in modelling the interactions of two
sentences with deep neural networks. However, most of the existing methods
encode two sequences with separate encoders, in which a sentence is encoded
with little or no information from the other sentence. In this paper, we
propose a deep architecture to model the strong interaction of sentence pair
with two coupled-LSTMs. Specifically, we introduce two coupled ways to model
the interdependences of two LSTMs, coupling the local contextualized
interactions of two sentences. We then aggregate these interactions and use a
dynamic pooling to select the most informative features. Experiments on two
very large datasets demonstrate the efficacy of our proposed architecture and
its superiority to state-of-the-art methods.Comment: Submitted to IJCAI 201
Facial Emotion Detection Using Convolutional Neural Networks and Representational Autoencoder Units
Emotion being a subjective thing, leveraging knowledge and science behind
labeled data and extracting the components that constitute it, has been a
challenging problem in the industry for many years. With the evolution of deep
learning in computer vision, emotion recognition has become a widely-tackled
research problem. In this work, we propose two independent methods for this
very task. The first method uses autoencoders to construct a unique
representation of each emotion, while the second method is an 8-layer
convolutional neural network (CNN). These methods were trained on the
posed-emotion dataset (JAFFE), and to test their robustness, both the models
were also tested on 100 random images from the Labeled Faces in the Wild (LFW)
dataset, which consists of images that are candid than posed. The results show
that with more fine-tuning and depth, our CNN model can outperform the
state-of-the-art methods for emotion recognition. We also propose some exciting
ideas for expanding the concept of representational autoencoders to improve
their performance.Comment: 6 pages, 8 figures, and 3 table
- …