Search CORE

2,453 research outputs found

Recursive Neural Language Architecture for Tag Prediction

Author: Kataria Saurabh
Publication venue
Publication date: 24/03/2016
Field of study

We consider the problem of learning distributed representations for tags from their associated content for the task of tag recommendation. Considering tagging information is usually very sparse, effective learning from content and tag association is very crucial and challenging task. Recently, various neural representation learning models such as WSABIE and its variants show promising performance, mainly due to compact feature representations learned in a semantic space. However, their capacity is limited by a linear compositional approach for representing tags as sum of equal parts and hurt their performance. In this work, we propose a neural feedback relevance model for learning tag representations with weighted feature representations. Our experiments on two widely used datasets show significant improvement for quality of recommendations over various baselines

arXiv.org e-Print Archive

Learning Distributed Representations of Sentences from Unlabelled Data

Author: Cho Kyunghyun
Hill Felix
Korhonen Anna
Publication venue
Publication date: 10/02/2016
Field of study

Unsupervised methods for learning distributed representations of words are ubiquitous in today's NLP research, but far less is known about the best ways to learn distributed phrase or sentence representations from unlabelled data. This paper is a systematic comparison of models that learn such representations. We find that the optimal approach depends critically on the intended application. Deeper, more complex models are preferable for representations to be used in supervised systems, but shallow log-linear models work best for building representation spaces that can be decoded with simple spatial distance metrics. We also propose two new unsupervised representation-learning objectives designed to optimise the trade-off between training time, domain portability and performance

arXiv.org e-Print Archive

Unsupervised Visual Representation Learning by Context Prediction

Author: Doersch Carl
Efros Alexei A.
Gupta Abhinav
Publication venue
Publication date: 16/01/2016
Field of study

This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts. We demonstrate that the feature representation learned using this within-image context indeed captures visual similarity across images. For example, this representation allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Furthermore, we show that the learned ConvNet can be used in the R-CNN framework and provides a significant boost over a randomly-initialized ConvNet, resulting in state-of-the-art performance among algorithms which use only Pascal-provided training set annotations.Comment: Oral paper at ICCV 201

arXiv.org e-Print Archive

Joint auto-encoders: a flexible multi-task learning framework

Author: Meir Baruch Epstein. Ron
Michaeli Tomer
Publication venue
Publication date: 30/05/2017
Field of study

The incorporation of prior knowledge into learning is essential in achieving good performance based on small noisy samples. Such knowledge is often incorporated through the availability of related data arising from domains and tasks similar to the one of current interest. Ideally one would like to allow both the data for the current task and for previous related tasks to self-organize the learning system in such a way that commonalities and differences between the tasks are learned in a data-driven fashion. We develop a framework for learning multiple tasks simultaneously, based on sharing features that are common to all tasks, achieved through the use of a modular deep feedforward neural network consisting of shared branches, dealing with the common features of all tasks, and private branches, learning the specific unique aspects of each task. Once an appropriate weight sharing architecture has been established, learning takes place through standard algorithms for feedforward networks, e.g., stochastic gradient descent and its variations. The method deals with domain adaptation and multi-task learning in a unified fashion, and can easily deal with data arising from different types of sources. Numerical experiments demonstrate the effectiveness of learning in domain adaptation and transfer learning setups, and provide evidence for the flexible and task-oriented representations arising in the network

arXiv.org e-Print Archive

Cross-topic distributional semantic representations via unsupervised mappings

Author: Athanasiou Nikos
Briakou Eleftheria
Potamianos Alexandros
Publication venue
Publication date: 11/04/2019
Field of study

In traditional Distributional Semantic Models (DSMs) the multiple senses of a polysemous word are conflated into a single vector space representation. In this work, we propose a DSM that learns multiple distributional representations of a word based on different topics. First, a separate DSM is trained for each topic and then each of the topic-based DSMs is aligned to a common vector space. Our unsupervised mapping approach is motivated by the hypothesis that words preserving their relative distances in different topic semantic sub-spaces constitute robust \textit{semantic anchors} that define the mappings between them. Aligned cross-topic representations achieve state-of-the-art results for the task of contextual word similarity. Furthermore, evaluation on NLP downstream tasks shows that multiple topic-based embeddings outperform single-prototype models.Comment: NAACL-HLT 201

arXiv.org e-Print Archive

Event Representations with Tensor-based Compositions

Author: Balasubramanian Niranjan
Chambers Nathanael
Weber Noah
Publication venue
Publication date: 20/11/2017
Field of study

Robust and flexible event representations are important to many core areas in language understanding. Scripts were proposed early on as a way of representing sequences of events for such understanding, and has recently attracted renewed attention. However, obtaining effective representations for modeling script-like event sequences is challenging. It requires representations that can capture event-level and scenario-level semantics. We propose a new tensor-based composition method for creating event representations. The method captures more subtle semantic interactions between an event and its entities and yields representations that are effective at multiple event-related tasks. With the continuous representations, we also devise a simple schema generation method which produces better schemas compared to a prior discrete representation based method. Our analysis shows that the tensors capture distinct usages of a predicate even when there are only subtle differences in their surface realizations.Comment: Accepted at AAAI 201

arXiv.org e-Print Archive

Natural Language Inference by Tree-Based Convolution and Heuristic Matching

Author: Jin Zhi
Li Ge
Men Rui
Mou Lili
Xu Yan
Yan Rui
Zhang Lu
Publication venue
Publication date: 13/05/2016
Field of study

In this paper, we propose the TBCNN-pair model to recognize entailment and contradiction between two sentences. In our model, a tree-based convolutional neural network (TBCNN) captures sentence-level semantics; then heuristic matching layers like concatenation, element-wise product/difference combine the information in individual sentences. Experimental results show that our model outperforms existing sentence encoding-based approaches by a large margin.Comment: Accepted by ACL'16 as a short pape

arXiv.org e-Print Archive

Spherical Latent Spaces for Stable Variational Autoencoders

Author: Durrett Greg
Xu Jiacheng
Publication venue
Publication date: 11/10/2018
Field of study

A hallmark of variational autoencoders (VAEs) for text processing is their combination of powerful encoder-decoder models, such as LSTMs, with simple latent distributions, typically multivariate Gaussians. These models pose a difficult optimization problem: there is an especially bad local optimum where the variational posterior always equals the prior and the model does not use the latent variable at all, a kind of "collapse" which is encouraged by the KL divergence term of the objective. In this work, we experiment with another choice of latent distribution, namely the von Mises-Fisher (vMF) distribution, which places mass on the surface of the unit hypersphere. With this choice of prior and posterior, the KL divergence term now only depends on the variance of the vMF distribution, giving us the ability to treat it as a fixed hyperparameter. We show that doing so not only averts the KL collapse, but consistently gives better likelihoods than Gaussians across a range of modeling conditions, including recurrent language modeling and bag-of-words document modeling. An analysis of the properties of our vMF representations shows that they learn richer and more nuanced structures in their latent representations than their Gaussian counterparts.Comment: To appear in EMNLP 2018; 11 pages; Code release: https://github.com/jiacheng-xu/vmf_vae_nl

arXiv.org e-Print Archive

Modelling Interaction of Sentence Pair with coupled-LSTMs

Author: Huang Xuanjing
Liu Pengfei
Qiu Xipeng
Publication venue
Publication date: 19/05/2016
Field of study

Recently, there is rising interest in modelling the interactions of two sentences with deep neural networks. However, most of the existing methods encode two sequences with separate encoders, in which a sentence is encoded with little or no information from the other sentence. In this paper, we propose a deep architecture to model the strong interaction of sentence pair with two coupled-LSTMs. Specifically, we introduce two coupled ways to model the interdependences of two LSTMs, coupling the local contextualized interactions of two sentences. We then aggregate these interactions and use a dynamic pooling to select the most informative features. Experiments on two very large datasets demonstrate the efficacy of our proposed architecture and its superiority to state-of-the-art methods.Comment: Submitted to IJCAI 201

arXiv.org e-Print Archive

Facial Emotion Detection Using Convolutional Neural Networks and Representational Autoencoder Units

Author: Dachapally Prudhvi Raj
Publication venue
Publication date: 05/06/2017
Field of study

Emotion being a subjective thing, leveraging knowledge and science behind labeled data and extracting the components that constitute it, has been a challenging problem in the industry for many years. With the evolution of deep learning in computer vision, emotion recognition has become a widely-tackled research problem. In this work, we propose two independent methods for this very task. The first method uses autoencoders to construct a unique representation of each emotion, while the second method is an 8-layer convolutional neural network (CNN). These methods were trained on the posed-emotion dataset (JAFFE), and to test their robustness, both the models were also tested on 100 random images from the Labeled Faces in the Wild (LFW) dataset, which consists of images that are candid than posed. The results show that with more fine-tuning and depth, our CNN model can outperform the state-of-the-art methods for emotion recognition. We also propose some exciting ideas for expanding the concept of representational autoencoders to improve their performance.Comment: 6 pages, 8 figures, and 3 table

arXiv.org e-Print Archive