3 research outputs found
Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation
Inferring the probability distribution of sentences or word sequences is a
key process in natural language processing. While word-level language models
(LMs) have been widely adopted for computing the joint probabilities of word
sequences, they have difficulty in capturing a context long enough for sentence
probability estimation (SPE). To overcome this, recent studies introduced
training methods using sentence-level noise-contrastive estimation (NCE) with
recurrent neural networks (RNNs). In this work, we attempt to extend it for
contextual SPE, which aims to estimate a conditional sentence probability given
a previous text. The proposed NCE samples negative sentences independently of a
previous text so that the trained model gives higher probabilities to the
sentences that are more consistent with \textcolor{blue}{the} context. We apply
our method to a simple word-level RNN LM to focus on the effect of the
sentence-level NCE training rather than on the network architecture. The
quality of estimation was evaluated against multiple-choice cloze-style
questions including both human and automatically generated questions. The
experimental results show that the proposed method improved the SPE quality for
the word-level RNN LM.Comment: 8 pages, 1 figures, 3 figure
Text Classification with Lexicon from PreAttention Mechanism
A comprehensive and high-quality lexicon plays a crucial role in traditional
text classification approaches. And it improves the utilization of the
linguistic knowledge. Although it is helpful for the task, the lexicon has got
little attention in recent neural network models. Firstly, getting a
high-quality lexicon is not easy. We lack an effective automated lexicon
extraction method, and most lexicons are hand crafted, which is very
inefficient for big data. What's more, there is no an effective way to use a
lexicon in a neural network. To address those limitations, we propose a
Pre-Attention mechanism for text classification in this paper, which can learn
attention of different words according to their effects in the classification
tasks. The words with different attention can form a domain lexicon.
Experiments on three benchmark text classification tasks show that our models
get competitive result comparing with the state-of-the-art methods. We get
90.5% accuracy on Stanford Large Movie Review dataset, 82.3% on Subjectivity
dataset, 93.7% on Movie Reviews. And compared with the text classification
model without Pre-Attention mechanism, those with Pre-Attention mechanism
improve by 0.9%-2.4% accuracy, which proves the validity of the Pre-Attention
mechanism. In addition, the Pre-Attention mechanism performs well followed by
different types of neural networks (e.g., convolutional neural networks and
Long Short-Term Memory networks). For the same dataset, when we use
Pre-Attention mechanism to get attention value followed by different neural
networks, those words with high attention values have a high degree of
coincidence, which proves the versatility and portability of the Pre-Attention
mechanism. we can get stable lexicons by attention values, which is an
inspiring method of information extraction.Comment: 11 page
Depression Detection with Multi-Modalities Using a Hybrid Deep Learning Model on Social Media
Social networks enable people to interact with one another by sharing
information, sending messages, making friends, and having discussions, which
generates massive amounts of data every day, popularly called as the
user-generated content. This data is present in various forms such as images,
text, videos, links, and others and reflects user behaviours including their
mental states. It is challenging yet promising to automatically detect mental
health problems from such data which is short, sparse and sometimes poorly
phrased. However, there are efforts to automatically learn patterns using
computational models on such user-generated content. While many previous works
have largely studied the problem on a small-scale by assuming uni-modality of
data which may not give us faithful results, we propose a novel scalable hybrid
model that combines Bidirectional Gated Recurrent Units (BiGRUs) and
Convolutional Neural Networks to detect depressed users on social media such as
Twitter-based on multi-modal features. Specifically, we encode words in user
posts using pre-trained word embeddings and BiGRUs to capture latent
behavioural patterns, long-term dependencies, and correlation across the
modalities, including semantic sequence features from the user timelines
(posts). The CNN model then helps learn useful features. Our experiments show
that our model outperforms several popular and strong baseline methods,
demonstrating the effectiveness of combining deep learning with multi-modal
features. We also show that our model helps improve predictive performance when
detecting depression in users who are posting messages publicly on social
media.Comment: 23 Page