4,977 research outputs found
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
We present a novel response generation system that can be trained end to end
on large quantities of unstructured Twitter conversations. A neural network
architecture is used to address sparsity issues that arise when integrating
contextual information into classic statistical models, allowing the system to
take into account previous dialog utterances. Our dynamic-context generative
models show consistent gains over both context-sensitive and
non-context-sensitive Machine Translation and Information Retrieval baselines.Comment: A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell,
J.-Y. Nie, J. Gao, B. Dolan. 2015. A Neural Network Approach to
Context-Sensitive Generation of Conversational Responses. In Proc. of
NAACL-HLT. Pages 196-20
Audio-Linguistic Embeddings for Spoken Sentences
We propose spoken sentence embeddings which capture both acoustic and
linguistic content. While existing works operate at the character, phoneme, or
word level, our method learns long-term dependencies by modeling speech at the
sentence level. Formulated as an audio-linguistic multitask learning problem,
our encoder-decoder model simultaneously reconstructs acoustic and natural
language features from audio. Our results show that spoken sentence embeddings
outperform phoneme and word-level baselines on speech recognition and emotion
recognition tasks. Ablation studies show that our embeddings can better model
high-level acoustic concepts while retaining linguistic content. Overall, our
work illustrates the viability of generic, multi-modal sentence embeddings for
spoken language understanding.Comment: International Conference on Acoustics, Speech, and Signal Processing
(ICASSP) 201
Deep Learning for Sentiment Analysis : A Survey
Deep learning has emerged as a powerful machine learning technique that
learns multiple layers of representations or features of the data and produces
state-of-the-art prediction results. Along with the success of deep learning in
many other application domains, deep learning is also popularly used in
sentiment analysis in recent years. This paper first gives an overview of deep
learning and then provides a comprehensive survey of its current applications
in sentiment analysis.Comment: 34 pages, 9 figures, 2 table
Investigating Linguistic Pattern Ordering in Hierarchical Natural Language Generation
Natural language generation (NLG) is a critical component in spoken dialogue
system, which can be divided into two phases: (1) sentence planning: deciding
the overall sentence structure, (2) surface realization: determining specific
word forms and flattening the sentence structure into a string. With the rise
of deep learning, most modern NLG models are based on a sequence-to-sequence
(seq2seq) model, which basically contains an encoder-decoder structure; these
NLG models generate sentences from scratch by jointly optimizing sentence
planning and surface realization. However, such simple encoder-decoder
architecture usually fail to generate complex and long sentences, because the
decoder has difficulty learning all grammar and diction knowledge well. This
paper introduces an NLG model with a hierarchical attentional decoder, where
the hierarchy focuses on leveraging linguistic knowledge in a specific order.
The experiments show that the proposed method significantly outperforms the
traditional seq2seq model with a smaller model size, and the design of the
hierarchical attentional decoder can be applied to various NLG systems.
Furthermore, different generation strategies based on linguistic patterns are
investigated and analyzed in order to guide future NLG research work.Comment: accepted by the 7th IEEE Workshop on Spoken Language Technology (SLT
2018). arXiv admin note: text overlap with arXiv:1808.0274
Similarity Analysis of Contextual Word Representation Models
This paper investigates contextual word representation models from the lens
of similarity analysis. Given a collection of trained models, we measure the
similarity of their internal representations and attention. Critically, these
models come from vastly different architectures. We use existing and novel
similarity measures that aim to gauge the level of localization of information
in the deep models, and facilitate the investigation of which design factors
affect model similarity, without requiring any external linguistic annotation.
The analysis reveals that models within the same family are more similar to one
another, as may be expected. Surprisingly, different architectures have rather
similar representations, but different individual neurons. We also observed
differences in information localization in lower and higher layers and found
that higher layers are more affected by fine-tuning on downstream tasks.Comment: Accepted to ACL 202
Combating Fake News: A Survey on Identification and Mitigation Techniques
The proliferation of fake news on social media has opened up new directions
of research for timely identification and containment of fake news, and
mitigation of its widespread impact on public opinion. While much of the
earlier research was focused on identification of fake news based on its
contents or by exploiting users' engagements with the news on social media,
there has been a rising interest in proactive intervention strategies to
counter the spread of misinformation and its impact on society. In this survey,
we describe the modern-day problem of fake news and, in particular, highlight
the technical challenges associated with it. We discuss existing methods and
techniques applicable to both identification and mitigation, with a focus on
the significant advances in each method and their advantages and limitations.
In addition, research has often been limited by the quality of existing
datasets and their specific application contexts. To alleviate this problem, we
comprehensively compile and summarize characteristic features of available
datasets. Furthermore, we outline new directions of research to facilitate
future development of effective and interdisciplinary solutions
The Role of Conversation Context for Sarcasm Detection in Online Interactions
Computational models for sarcasm detection have often relied on the content
of utterances in isolation. However, speaker's sarcastic intent is not always
obvious without additional context. Focusing on social media discussions, we
investigate two issues: (1) does modeling of conversation context help in
sarcasm detection and (2) can we understand what part of conversation context
triggered the sarcastic reply. To address the first issue, we investigate
several types of Long Short-Term Memory (LSTM) networks that can model both the
conversation context and the sarcastic response. We show that the conditional
LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level
attention on context and response outperform the LSTM model that reads only the
response. To address the second issue, we present a qualitative analysis of
attention weights produced by the LSTM models with attention and discuss the
results compared with human performance on the task.Comment: SIGDial 201
Acoustic-to-Word Models with Conversational Context Information
Conversational context information, higher-level knowledge that spans across
sentences, can help to recognize a long conversation. However, existing speech
recognition models are typically built at a sentence level, and thus it may not
capture important conversational context information. The recent progress in
end-to-end speech recognition enables integrating context with other available
information (e.g., acoustic, linguistic resources) and directly recognizing
words from speech. In this work, we present a direct acoustic-to-word,
end-to-end speech recognition model capable of utilizing the conversational
context to better process long conversations. We evaluate our proposed approach
on the Switchboard conversational speech corpus and show that our system
outperforms a standard end-to-end speech recognition system.Comment: NAACL 2019. arXiv admin note: text overlap with arXiv:1808.0217
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
Long short-term memory (LSTM) networks and their variants are capable of
encapsulating long-range dependencies, which is evident from their performance
on a variety of linguistic tasks. On the other hand, simple recurrent networks
(SRNs), which appear more biologically grounded in terms of synaptic
connections, have generally been less successful at capturing long-range
dependencies as well as the loci of grammatical errors in an unsupervised
setting. In this paper, we seek to develop models that bridge the gap between
biological plausibility and linguistic competence. We propose a new
architecture, the Decay RNN, which incorporates the decaying nature of neuronal
activations and models the excitatory and inhibitory connections in a
population of neurons. Besides its biological inspiration, our model also shows
competitive performance relative to LSTMs on subject-verb agreement, sentence
grammaticality, and language modeling tasks. These results provide some
pointers towards probing the nature of the inductive biases required for RNN
architectures to model linguistic phenomena successfully.Comment: 11 pages, 5 figures (including appendix); to appear at ACL SRW 202
Tacotron: Towards End-to-End Speech Synthesis
A text-to-speech synthesis system typically consists of multiple stages, such
as a text analysis frontend, an acoustic model and an audio synthesis module.
Building these components often requires extensive domain expertise and may
contain brittle design choices. In this paper, we present Tacotron, an
end-to-end generative text-to-speech model that synthesizes speech directly
from characters. Given pairs, the model can be trained completely
from scratch with random initialization. We present several key techniques to
make the sequence-to-sequence framework perform well for this challenging task.
Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English,
outperforming a production parametric system in terms of naturalness. In
addition, since Tacotron generates speech at the frame level, it's
substantially faster than sample-level autoregressive methods.Comment: Submitted to Interspeech 2017. v2 changed paper title to be
consistent with our conference submission (no content change other than typo
fixes
- …