785 research outputs found
Relaxed Softmax for learning from Positive and Unlabeled data
In recent years, the softmax model and its fast approximations have become
the de-facto loss functions for deep neural networks when dealing with
multi-class prediction. This loss has been extended to language modeling and
recommendation, two fields that fall into the framework of learning from
Positive and Unlabeled data. In this paper, we stress the different drawbacks
of the current family of softmax losses and sampling schemes when applied in a
Positive and Unlabeled learning setup. We propose both a Relaxed Softmax loss
(RS) and a new negative sampling scheme based on Boltzmann formulation. We show
that the new training objective is better suited for the tasks of density
estimation, item similarity and next-event prediction by driving uplifts in
performance on textual and recommendation datasets against classical softmax.Comment: 9 pages, 5 figures, 2 tables, published at RecSys 201
Attention-Based LSTM for Psychological Stress Detection from Spoken Language Using Distant Supervision
We propose a Long Short-Term Memory (LSTM) with attention mechanism to
classify psychological stress from self-conducted interview transcriptions. We
apply distant supervision by automatically labeling tweets based on their
hashtag content, which complements and expands the size of our corpus. This
additional data is used to initialize the model parameters, and which it is
fine-tuned using the interview data. This improves the model's robustness,
especially by expanding the vocabulary size. The bidirectional LSTM model with
attention is found to be the best model in terms of accuracy (74.1%) and
f-score (74.3%). Furthermore, we show that distant supervision fine-tuning
enhances the model's performance by 1.6% accuracy and 2.1% f-score. The
attention mechanism helps the model to select informative words.Comment: Accepted in ICASSP 201
Relaxed Attention for Transformer Models
The powerful modeling capabilities of all-attention-based transformer
architectures often cause overfitting and - for natural language processing
tasks - lead to an implicitly learned internal language model in the
autoregressive transformer decoder complicating the integration of external
language models. In this paper, we explore relaxed attention, a simple and
easy-to-implement smoothing of the attention weights, yielding a two-fold
improvement to the general transformer architecture: First, relaxed attention
provides regularization when applied to the self-attention layers in the
encoder. Second, we show that it naturally supports the integration of an
external language model as it suppresses the implicitly learned internal
language model by relaxing the cross attention in the decoder. We demonstrate
the benefit of relaxed attention across several tasks with clear improvement in
combination with recent benchmark approaches. Specifically, we exceed the
former state-of-the-art performance of 26.90% word error rate on the largest
public lip-reading LRS3 benchmark with a word error rate of 26.31%, as well as
we achieve a top-performing BLEU score of 37.67 on the IWSLT14
(DEEN) machine translation task without external language models
and virtually no additional model parameters. Code and models will be made
publicly available
Transfer Learning for Neural Semantic Parsing
The goal of semantic parsing is to map natural language to a machine
interpretable meaning representation language (MRL). One of the constraints
that limits full exploration of deep learning technologies for semantic parsing
is the lack of sufficient annotation training data. In this paper, we propose
using sequence-to-sequence in a multi-task setup for semantic parsing with a
focus on transfer learning. We explore three multi-task architectures for
sequence-to-sequence modeling and compare their performance with an
independently trained model. Our experiments show that the multi-task setup
aids transfer learning from an auxiliary task with large labeled data to a
target task with smaller labeled data. We see absolute accuracy gains ranging
from 1.0% to 4.4% in our in- house data set, and we also see good gains ranging
from 2.5% to 7.0% on the ATIS semantic parsing tasks with syntactic and
semantic auxiliary tasks.Comment: Accepted for ACL Repl4NLP 201
- …