719 research outputs found
Interpretation of Semantic Tweet Representations
Research in analysis of microblogging platforms is experiencing a renewed
surge with a large number of works applying representation learning models for
applications like sentiment analysis, semantic textual similarity computation,
hashtag prediction, etc. Although the performance of the representation
learning models has been better than the traditional baselines for such tasks,
little is known about the elementary properties of a tweet encoded within these
representations, or why particular representations work better for certain
tasks. Our work presented here constitutes the first step in opening the
black-box of vector embeddings for tweets. Traditional feature engineering
methods for high-level applications have exploited various elementary
properties of tweets. We believe that a tweet representation is effective for
an application because it meticulously encodes the application-specific
elementary properties of tweets. To understand the elementary properties
encoded in a tweet representation, we evaluate the representations on the
accuracy to which they can model each of those properties such as tweet length,
presence of particular words, hashtags, mentions, capitalization, etc. Our
systematic extensive study of nine supervised and four unsupervised tweet
representations against most popular eight textual and five social elementary
properties reveal that Bi-directional LSTMs (BLSTMs) and Skip-Thought Vectors
(STV) best encode the textual and social properties of tweets respectively.
FastText is the best model for low resource settings, providing very little
degradation with reduction in embedding size. Finally, we draw interesting
insights by correlating the model performance obtained for elementary property
prediction tasks with the highlevel downstream applications.Comment: Accepted at ASONAM 2017; Initial version presented at NIPS 2016
workshop can be found at arXiv:1611.0488
A Deep Generative Framework for Paraphrase Generation
Paraphrase generation is an important problem in NLP, especially in question
answering, information retrieval, information extraction, conversation systems,
to name a few. In this paper, we address the problem of generating paraphrases
automatically. Our proposed method is based on a combination of deep generative
models (VAE) with sequence-to-sequence models (LSTM) to generate paraphrases,
given an input sentence. Traditional VAEs when combined with recurrent neural
networks can generate free text but they are not suitable for paraphrase
generation for a given sentence. We address this problem by conditioning the
both, encoder and decoder sides of VAE, on the original sentence, so that it
can generate the given sentence's paraphrases. Unlike most existing models, our
model is simple, modular and can generate multiple paraphrases, for a given
sentence. Quantitative evaluation of the proposed method on a benchmark
paraphrase dataset demonstrates its efficacy, and its performance improvement
over the state-of-the-art methods by a significant margin, whereas qualitative
human evaluation indicate that the generated paraphrases are well-formed,
grammatically correct, and are relevant to the input sentence. Furthermore, we
evaluate our method on a newly released question paraphrase dataset, and
establish a new baseline for future research
Conversational End-to-End TTS for Voice Agent
End-to-end neural TTS has achieved superior performance on reading style
speech synthesis. However, it's still a challenge to build a high-quality
conversational TTS due to the limitations of the corpus and modeling
capability. This study aims at building a conversational TTS for a voice agent
under sequence to sequence modeling framework. We firstly construct a
spontaneous conversational speech corpus well designed for the voice agent with
a new recording scheme ensuring both recording quality and conversational
speaking style. Secondly, we propose a conversation context-aware end-to-end
TTS approach which has an auxiliary encoder and a conversational context
encoder to reinforce the information about the current utterance and its
context in a conversation as well. Experimental results show that the proposed
methods produce more natural prosody in accordance with the conversational
context, with significant preference gains at both utterance-level and
conversation-level. Moreover, we find that the model has the ability to express
some spontaneous behaviors, like fillers and repeated words, which makes the
conversational speaking style more realistic.Comment: Accepted by SLT 2021; 7 page
Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability
Generative encoder-decoder models offer great promise in developing
domain-general dialog systems. However, they have mainly been applied to
open-domain conversations. This paper presents a practical and novel framework
for building task-oriented dialog systems based on encoder-decoder models. This
framework enables encoder-decoder models to accomplish slot-value independent
decision-making and interact with external databases. Moreover, this paper
shows the flexibility of the proposed method by interleaving chatting
capability with a slot-filling system for better out-of-domain recovery. The
models were trained on both real-user data from a bus information system and
human-human chat data. Results show that the proposed framework achieves good
performance in both offline evaluation metrics and in task success rate with
human users.Comment: Accepted as a long paper in SIGIDIAL 201
Style Transfer in Text: Exploration and Evaluation
Style transfer is an important problem in natural language processing (NLP).
However, the progress in language style transfer is lagged behind other
domains, such as computer vision, mainly because of the lack of parallel data
and principle evaluation metrics. In this paper, we propose to learn style
transfer with non-parallel data. We explore two models to achieve this goal,
and the key idea behind the proposed models is to learn separate content
representations and style representations using adversarial networks. We also
propose novel evaluation metrics which measure two aspects of style transfer:
transfer strength and content preservation. We access our models and the
evaluation metrics on two tasks: paper-news title transfer, and
positive-negative review transfer. Results show that the proposed content
preservation metric is highly correlate to human judgments, and the proposed
models are able to generate sentences with higher style transfer strength and
similar content preservation score comparing to auto-encoder.Comment: To appear in AAAI-1
Very Deep Self-Attention Networks for End-to-End Speech Recognition
Recently, end-to-end sequence-to-sequence models for speech recognition have
gained significant interest in the research community. While previous
architecture choices revolve around time-delay neural networks (TDNN) and long
short-term memory (LSTM) recurrent neural networks, we propose to use
self-attention via the Transformer architecture as an alternative. Our analysis
shows that deep Transformer networks with high learning capacity are able to
exceed performance from previous end-to-end approaches and even match the
conventional hybrid systems. Moreover, we trained very deep models with up to
48 Transformer layers for both encoder and decoders combined with stochastic
residual connections, which greatly improve generalizability and training
efficiency. The resulting models outperform all previous end-to-end ASR
approaches on the Switchboard benchmark. An ensemble of these models achieve
9.9% and 17.7% WER on Switchboard and CallHome test sets respectively. This
finding brings our end-to-end models to competitive levels with previous hybrid
systems. Further, with model ensembling the Transformers can outperform certain
hybrid systems, which are more complicated in terms of both structure and
training procedure.Comment: Submitted to INTERSPEECH 201
Game-Based Video-Context Dialogue
Current dialogue systems focus more on textual and speech context knowledge
and are usually based on two speakers. Some recent work has investigated static
image-based dialogue. However, several real-world human interactions also
involve dynamic visual context (similar to videos) as well as dialogue
exchanges among multiple speakers. To move closer towards such multimodal
conversational skills and visually-situated applications, we introduce a new
video-context, many-speaker dialogue dataset based on live-broadcast soccer
game videos and chats from Twitch.tv. This challenging testbed allows us to
develop visually-grounded dialogue models that should generate relevant
temporal and spatial event language from the live video, while also being
relevant to the chat history. For strong baselines, we also present several
discriminative and generative models, e.g., based on tridirectional attention
flow (TriDAF). We evaluate these models via retrieval ranking-recall, automatic
phrase-matching metrics, as well as human evaluation studies. We also present
dataset analyses, model ablations, and visualizations to understand the
contribution of different modalities and model components.Comment: EMNLP 2018 (14 pages) (fixed Table5 typo in v2
A Hybrid Retrieval-Generation Neural Conversation Model
Intelligent personal assistant systems that are able to have multi-turn
conversations with human users are becoming increasingly popular. Most previous
research has been focused on using either retrieval-based or generation-based
methods to develop such systems. Retrieval-based methods have the advantage of
returning fluent and informative responses with great diversity. However, the
performance of the methods is limited by the size of the response repository.
On the other hand, generation-based methods can produce highly coherent
responses on any topics. But the generated responses are often generic and not
informative due to the lack of grounding knowledge. In this paper, we propose a
hybrid neural conversation model that combines the merits of both response
retrieval and generation methods. Experimental results on Twitter and
Foursquare data show that the proposed model outperforms both retrieval-based
methods and generation-based methods (including a recently proposed
knowledge-grounded neural conversation model) under both automatic evaluation
metrics and human evaluation. We hope that the findings in this study provide
new insights on how to integrate text retrieval and text generation models for
building conversation systems.Comment: Accepted as a Full Paper in CIKM 2019. 10 page
Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders
While recent neural encoder-decoder models have shown great promise in
modeling open-domain conversations, they often generate dull and generic
responses. Unlike past work that has focused on diversifying the output of the
decoder at word-level to alleviate this problem, we present a novel framework
based on conditional variational autoencoders that captures the discourse-level
diversity in the encoder. Our model uses latent variables to learn a
distribution over potential conversational intents and generates diverse
responses using only greedy decoders. We have further developed a novel variant
that is integrated with linguistic prior knowledge for better performance.
Finally, the training procedure is improved by introducing a bag-of-word loss.
Our proposed models have been validated to generate significantly more diverse
responses than baseline approaches and exhibit competence in discourse-level
decision-making.Comment: Appeared in ACL2017 proceedings as a long paper. Correct a
calculation mistake in Table 1 E-bow & A-bow and results into higher score
Sentiment Transfer using Seq2Seq Adversarial Autoencoders
Expressing in language is subjective. Everyone has a different style of
reading and writing, apparently it all boil downs to the way their mind
understands things (in a specific format). Language style transfer is a way to
preserve the meaning of a text and change the way it is expressed. Progress in
language style transfer is lagged behind other domains, such as computer
vision, mainly because of the lack of parallel data, use cases, and reliable
evaluation metrics. In response to the challenge of lacking parallel data, we
explore learning style transfer from non-parallel data. We propose a model
combining seq2seq, autoencoders, and adversarial loss to achieve this goal. The
key idea behind the proposed models is to learn separate content
representations and style representations using adversarial networks.
Considering the problem of evaluating style transfer tasks, we frame the
problem as sentiment transfer and evaluation using a sentiment classifier to
calculate how many sentiments was the model able to transfer. We report our
results on several kinds of models.Comment: Report built as a part of project for CSYE7245 Northeastern
University under Prof. Nik Brown. arXiv admin note: text overlap with
arXiv:1711.06861, arXiv:1409.3215, arXiv:1705.07663 by other author
- …