22,143 research outputs found
CEDR: Contextualized Embeddings for Document Ranking
Although considerable attention has been given to neural ranking
architectures recently, far less attention has been paid to the term
representations that are used as input to these models. In this work, we
investigate how two pretrained contextualized language models (ELMo and BERT)
can be utilized for ad-hoc document ranking. Through experiments on TREC
benchmarks, we find that several existing neural ranking architectures can
benefit from the additional context provided by contextualized language models.
Furthermore, we propose a joint approach that incorporates BERT's
classification vector into existing neural models and show that it outperforms
state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR
(Contextualized Embeddings for Document Ranking). We also address practical
challenges in using these models for ranking, including the maximum input
length imposed by BERT and runtime performance impacts of contextualized
language models.Comment: Appeared in SIGIR 2019, 4 page
Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition
We propose a novel approach to semi-supervised automatic speech recognition
(ASR). We first exploit a large amount of unlabeled audio data via
representation learning, where we reconstruct a temporal slice of filterbank
features from past and future context frames. The resulting deep contextualized
acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end
ASR system using a smaller amount of labeled audio data. In our experiments, we
show that systems trained on DeCoAR consistently outperform ones trained on
conventional filterbank features, giving 42% and 19% relative improvement over
the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our
approach can drastically reduce the amount of labeled data required;
unsupervised training on LibriSpeech then supervision with 100 hours of labeled
data achieves performance on par with training on all 960 hours directly.
Pre-trained models and code will be released online.Comment: Accepted to ICASSP 2020 (oral
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Stochastic Answer Networks for Machine Reading Comprehension
We propose a simple yet robust stochastic answer network (SAN) that simulates
multi-step reasoning in machine reading comprehension. Compared to previous
work such as ReasoNet which used reinforcement learning to determine the number
of steps, the unique feature is the use of a kind of stochastic prediction
dropout on the answer module (final layer) of the neural network during the
training. We show that this simple trick improves robustness and achieves
results competitive to the state-of-the-art on the Stanford Question Answering
Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading
COmprehension Dataset (MS MARCO).Comment: 11 pages, 5 figures, Accepted to ACL 201
Listening between the Lines: Learning Personal Attributes from Conversations
Open-domain dialogue agents must be able to converse about many topics while
incorporating knowledge about the user into the conversation. In this work we
address the acquisition of such knowledge, for personalization in downstream
Web applications, by extracting personal attributes from conversations. This
problem is more challenging than the established task of information extraction
from scientific publications or Wikipedia articles, because dialogues often
give merely implicit cues about the speaker. We propose methods for inferring
personal attributes, such as profession, age or family status, from
conversations using deep learning. Specifically, we propose several Hidden
Attribute Models, which are neural networks leveraging attention mechanisms and
embeddings. Our methods are trained on a per-predicate basis to output rankings
of object values for a given subject-predicate combination (e.g., ranking the
doctor and nurse professions high when speakers talk about patients, emergency
rooms, etc). Experiments with various conversational texts including Reddit
discussions, movie scripts and a collection of crowdsourced personal dialogues
demonstrate the viability of our methods and their superior performance
compared to state-of-the-art baselines.Comment: published in WWW'1
Alternative Weighting Schemes for ELMo Embeddings
ELMo embeddings (Peters et. al, 2018) had a huge impact on the NLP community
and may recent publications use these embeddings to boost the performance for
downstream NLP tasks. However, integration of ELMo embeddings in existent NLP
architectures is not straightforward. In contrast to traditional word
embeddings, like GloVe or word2vec embeddings, the bi-directional language
model of ELMo produces three 1024 dimensional vectors per token in a sentence.
Peters et al. proposed to learn a task-specific weighting of these three
vectors for downstream tasks. However, this proposed weighting scheme is not
feasible for certain tasks, and, as we will show, it does not necessarily yield
optimal performance. We evaluate different methods that combine the three
vectors from the language model in order to achieve the best possible
performance in downstream NLP tasks. We notice that the third layer of the
published language model often decreases the performance. By learning a
weighted average of only the first two layers, we are able to improve the
performance for many datasets. Due to the reduced complexity of the language
model, we have a training speed-up of 19-44% for the downstream task
- …
