Search CORE

7,272 research outputs found

Deep Residual Output Layers for Neural Language Generation

Author: Henderson James
Pappas Nikolaos
Publication venue
Publication date: 22/05/2019
Field of study

Many tasks, including language generation, benefit from learning the structure of the output space, particularly when the space of output labels is large and the data is sparse. State-of-the-art neural language models indirectly capture the output space structure in their classifier weights since they lack parameter sharing across output labels. Learning shared output label mappings helps, but existing methods have limited expressivity and are prone to overfitting. In this paper, we investigate the usefulness of more powerful shared mappings for output labels, and propose a deep residual output mapping with dropout between layers to better capture the structure of the output space and avoid overfitting. Evaluations on three language generation tasks show that our output label mapping can match or improve state-of-the-art recurrent and self-attention architectures, and suggest that the classifier does not necessarily need to be high-rank to better model natural language if it is better at capturing the structure of the output space.Comment: To appear in ICML 201

arXiv.org e-Print Archive

Long-Term Memory Networks for Question Answering

Author: Chitta Radha
Gao Jing
Kataria Saurabh
Ma Fenglong
Ramesh Palghat
Sun Tong
Zhou Jing
Publication venue
Publication date: 06/07/2017
Field of study

Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task. Several deep neural network architectures have been developed recently, which employ memory and inference components to memorize and reason over text information, and generate answers to questions. However, a major drawback of many such models is that they are capable of only generating single-word answers. In addition, they require large amount of training data to generate accurate answers. In this paper, we introduce the Long-Term Memory Network (LTMN), which incorporates both an external memory module and a Long Short-Term Memory (LSTM) module to comprehend the input data and generate multi-word answers. The LTMN model can be trained end-to-end using back-propagation and requires minimal supervision. We test our model on two synthetic data sets (based on Facebook's bAbI data set) and the real-world Stanford question answering data set, and show that it can achieve state-of-the-art performance

arXiv.org e-Print Archive

Learning to Compose Task-Specific Tree Structures

Author: Choi Jihun
Lee Sang-goo
Yoo Kang Min
Publication venue
Publication date: 21/11/2017
Field of study

For years, recursive neural networks (RvNNs) have been shown to be suitable for representing text into fixed-length vectors and achieved good performance on several natural language processing tasks. However, the main drawback of RvNNs is that they require structured input, which makes data preparation and model implementation hard. In this paper, we propose Gumbel Tree-LSTM, a novel tree-structured long short-term memory architecture that learns how to compose task-specific tree structures only from plain text data efficiently. Our model uses Straight-Through Gumbel-Softmax estimator to decide the parent node among candidates dynamically and to calculate gradients of the discrete decision. We evaluate the proposed model on natural language inference and sentiment analysis, and show that our model outperforms or is at least comparable to previous models. We also find that our model converges significantly faster than other models.Comment: AAAI 201

arXiv.org e-Print Archive

FRAGE: Frequency-Agnostic Word Representation

Author: Gong Chengyue
He Di
Liu Tie-Yan
Qin Tao
Tan Xu
Wang Liwei
Publication venue
Publication date: 17/03/2020
Field of study

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar. This makes learned word embeddings ineffective, especially for rare words, and consequently limits the performance of these neural network models. In this paper, we develop a neat, simple yet effective way to learn \emph{FRequency-AGnostic word Embedding} (FRAGE) using adversarial training. We conducted comprehensive studies on ten datasets across four natural language processing tasks, including word similarity, language modeling, machine translation and text classification. Results show that with FRAGE, we achieve higher performance than the baselines in all tasks.Comment: To appear in NIPS 201

arXiv.org e-Print Archive

Effective LSTMs for Target-Dependent Sentiment Classification

Author: Feng Xiaocheng
Liu Ting
Qin Bing
Tang Duyu
Publication venue
Publication date: 29/09/2016
Field of study

Target-dependent sentiment classification remains a challenge: modeling the semantic relatedness of a target with its context words in a sentence. Different context words have different influences on determining the sentiment polarity of a sentence towards the target. Therefore, it is desirable to integrate the connections between target word and context words when building a learning system. In this paper, we develop two target dependent long short-term memory (LSTM) models, where target information is automatically taken into account. We evaluate our methods on a benchmark dataset from Twitter. Empirical results show that modeling sentence representation with standard LSTM does not perform well. Incorporating target information into LSTM can significantly boost the classification accuracy. The target-dependent LSTM models achieve state-of-the-art performances without using syntactic parser or external sentiment lexicons.Comment: 7 pages, 3 figures published in COLING 201

arXiv.org e-Print Archive

Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition

Author: Anh L. T.
Arkhipov M. Y.
Burtsev M. S.
Publication venue
Publication date: 08/10/2017
Field of study

Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of approaches have been proposed for this task in Russian language, it still has a substantial potential for the better solutions. In this work, we studied several deep neural network models starting from vanilla Bi-directional Long Short-Term Memory (Bi-LSTM) then supplementing it with Conditional Random Fields (CRF) as well as highway networks and finally adding external word embeddings. All models were evaluated across three datasets: Gareev's dataset, Person-1000, FactRuEval-2016. We found that extension of Bi-LSTM model with CRF significantly increased the quality of predictions. Encoding input tokens with external word embeddings reduced training time and allowed to achieve state of the art for the Russian NER task.Comment: Artificial Intelligence and Natural Language Conference (AINL 2017

arXiv.org e-Print Archive

An Attention-Gated Convolutional Neural Network for Sentence Classification

Author: Gao Chao
Huang Ruiyang
Ji Lixin
Liu Yang
Ming Tuosiyu
Zhang Jianpeng
Publication venue
Publication date: 28/12/2018
Field of study

The classification of sentences is very challenging, since sentences contain the limited contextual information. In this paper, we proposed an Attention-Gated Convolutional Neural Network (AGCNN) for sentence classification, which generates attention weights from the feature's context windows of different sizes by using specialized convolution encoders. It makes full use of limited contextual information to extract and enhance the influence of important features in predicting the sentence's category. Experimental results demonstrated that our model can achieve up to 3.1% higher accuracy than standard CNN models, and gain competitive results over the baselines on four out of the six tasks. Besides, we designed an activation function, namely, Natural Logarithm rescaled Rectified Linear Unit (NLReLU). Experiments showed that NLReLU can outperform ReLU and is comparable to other well-known activation functions on AGCNN.Comment: Accepted for publication in the Intelligent Data Analysis journal, 19 pages, 4 figures and 5 table

arXiv.org e-Print Archive

Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings

Author: Giulianelli Mario
Publication venue
Publication date: 13/08/2017
Field of study

There exist two main approaches to automatically extract affective orientation: lexicon-based and corpus-based. In this work, we argue that these two methods are compatible and show that combining them can improve the accuracy of emotion classifiers. In particular, we introduce a novel variant of the Label Propagation algorithm that is tailored to distributed word representations, we apply batch gradient descent to accelerate the optimization of label propagation and to make the optimization feasible for large graphs, and we propose a reproducible method for emotion lexicon expansion. We conclude that label propagation can expand an emotion lexicon in a meaningful way and that the expanded emotion lexicon can be leveraged to improve the accuracy of an emotion classifier

arXiv.org e-Print Archive

An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing

Author: Le-Hong Phuong
Nguyen Dang-Minh
Nhat Minh Pham Quang
Pham Thai-Hoang
Tran Tuan-Anh
Publication venue
Publication date: 30/08/2017
Field of study

This paper presents an empirical study of two widely-used sequence prediction models, Conditional Random Fields (CRFs) and Long Short-Term Memory Networks (LSTMs), on two fundamental tasks for Vietnamese text processing, including part-of-speech tagging and named entity recognition. We show that a strong lower bound for labeling accuracy can be obtained by relying only on simple word-based features with minimal hand-crafted feature engineering, of 90.65\% and 86.03\% performance scores on the standard test sets for the two tasks respectively. In particular, we demonstrate empirically the surprising efficiency of word embeddings in both of the two tasks, with both of the two models. We point out that the state-of-the-art LSTMs model does not always outperform significantly the traditional CRFs model, especially on moderate-sized data sets. Finally, we give some suggestions and discussions for efficient use of sequence labeling models in practical applications.Comment: To appear in the Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE) 201

arXiv.org e-Print Archive

Machine Translation Evaluation with Neural Networks

Author: Guzmán Francisco
Joty Shafiq R.
Màrquez Lluís
Nakov Preslav
Publication venue
Publication date: 05/10/2017
Field of study

We present a framework for machine translation evaluation using neural networks in a pairwise setting, where the goal is to select the better translation from a pair of hypotheses, given the reference translation. In this framework, lexical, syntactic and semantic information from the reference and the two hypotheses is embedded into compact distributed vector representations, and fed into a multi-layer neural network that models nonlinear interactions between each of the hypotheses and the reference, as well as between the two hypotheses. We experiment with the benchmark datasets from the WMT Metrics shared task, on which we obtain the best results published so far, with the basic network configuration. We also perform a series of experiments to analyze and understand the contribution of the different components of the network. We evaluate variants and extensions, including fine-tuning of the semantic embeddings, and sentence-based representations modeled with convolutional and recurrent neural networks. In summary, the proposed framework is flexible and generalizable, allows for efficient learning and scoring, and provides an MT evaluation metric that correlates with human judgments, and is on par with the state of the art.Comment: Machine Translation, Reference-based MT Evaluation, Deep Neural Networks, Distributed Representation of Texts, Textual Similarit

arXiv.org e-Print Archive