271 research outputs found
Boosting Naturalness of Language in Task-oriented Dialogues via Adversarial Training
The natural language generation (NLG) module in a task-oriented dialogue
system produces user-facing utterances conveying required information. Thus, it
is critical for the generated response to be natural and fluent. We propose to
integrate adversarial training to produce more human-like responses. The model
uses Straight-Through Gumbel-Softmax estimator for gradient computation. We
also propose a two-stage training scheme to boost performance. Empirical
results show that the adversarial training can effectively improve the quality
of language generation in both automatic and human evaluations. For example, in
the RNN-LG Restaurant dataset, our model AdvNLG outperforms the previous
state-of-the-art result by 3.6% in BLEU.Comment: SIGDial 202
Information Diffusion and External Influence in Networks
Social networks play a fundamental role in the diffusion of information.
However, there are two different ways of how information reaches a person in a
network. Information reaches us through connections in our social networks, as
well as through the influence of external out-of-network sources, like the
mainstream media. While most present models of information adoption in networks
assume information only passes from a node to node via the edges of the
underlying network, the recent availability of massive online social media data
allows us to study this process in more detail. We present a model in which
information can reach a node via the links of the social network or through the
influence of external sources. We then develop an efficient model parameter
fitting technique and apply the model to the emergence of URL mentions in the
Twitter network. Using a complete one month trace of Twitter we study how
information reaches the nodes of the network. We quantify the external
influences over time and describe how these influences affect the information
adoption. We discover that the information tends to "jump" across the network,
which can only be explained as an effect of an unobservable external influence
on the network. We find that only about 71% of the information volume in
Twitter can be attributed to network diffusion, and the remaining 29% is due to
external events and factors outside the network
Optomechanical Transductions in Single and Coupled Wheel Resonators
In this report, the optomechanical transductions in both single and two
side-coupled wheel resonators are investigated. In the single resonator, the
optomechanical transduction sensitivity is determined by the optical and
mechanical quality factors of the resonator. In the coupled resonators, the
optomechanical transduction is related to the energy distribution in the two
resonators, which is strongly dependent on the input detuning. Compared to a
single resonator, the coupled resonators can still provide very sensitive
optomechanical transduction even if the optical and mechanical quality factors
of one resonator are degraded
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering
Open-domain question answering remains a challenging task as it requires
models that are capable of understanding questions and answers, collecting
useful information, and reasoning over evidence. Previous work typically
formulates this task as a reading comprehension or entailment problem given
evidence retrieved from search engines. However, existing techniques struggle
to retrieve indirectly related evidence when no directly related evidence is
provided, especially for complex questions where it is hard to parse precisely
what the question asks. In this paper we propose a retriever-reader model that
learns to attend on essential terms during the question answering process. We
build (1) an essential term selector which first identifies the most important
words in a question, then reformulates the query and searches for related
evidence; and (2) an enhanced reader that distinguishes between essential terms
and distracting words to predict the answer. We evaluate our model on multiple
open-domain multiple-choice QA datasets, notably performing at the level of the
state-of-the-art on the AI2 Reasoning Challenge (ARC) dataset
Data Augmentation for Spoken Language Understanding via Pretrained Models
The training of spoken language understanding (SLU) models often faces the
problem of data scarcity. In this paper, we put forward a data augmentation
method with pretrained language models to boost the variability and accuracy of
generated utterances. Furthermore, we investigate and propose solutions to two
previously overlooked scenarios of data scarcity in SLU: i) Rich-in-Ontology:
ontology information with numerous valid dialogue acts are given; ii)
Rich-in-Utterance: a large number of unlabelled utterances are available.
Empirical results show that our method can produce synthetic training data that
boosts the performance of language understanding models in various scenarios.Comment: 6 pages, 1 figur
FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension
This paper introduces a new neural structure called FusionNet, which extends
existing attention approaches from three perspectives. First, it puts forward a
novel concept of "history of word" to characterize attention information from
the lowest word-level embedding up to the highest semantic-level
representation. Second, it introduces an improved attention scoring function
that better utilizes the "history of word" concept. Third, it proposes a
fully-aware multi-level attention mechanism to capture the complete information
in one text (such as a question) and exploit it in its counterpart (such as
context or passage) layer by layer. We apply FusionNet to the Stanford Question
Answering Dataset (SQuAD) and it achieves the first position for both single
and ensemble model on the official SQuAD leaderboard at the time of writing
(Oct. 4th, 2017). Meanwhile, we verify the generalization of FusionNet with two
adversarial SQuAD datasets and it sets up the new state-of-the-art on both
datasets: on AddSent, FusionNet increases the best F1 metric from 46.6% to
51.4%; on AddOneSent, FusionNet boosts the best F1 metric from 56.0% to 60.7%.Comment: Published in Sixth International Conference on Learning
Representations (ICLR), 201
Impossible Triangle: What's Next for Pre-trained Language Models?
Recent development of large-scale pre-trained language models (PLM) have
significantly improved the capability of models in various NLP tasks, in terms
of performance after task-specific fine-tuning and zero-shot / few-shot
learning. However, many of such models come with a dauntingly huge size that
few institutions can afford to pre-train, fine-tune or even deploy, while
moderate-sized models usually lack strong generalized few-shot learning
capabilities. In this paper, we first elaborate the current obstacles of using
PLM models in terms of the Impossible Triangle: 1) moderate model size, 2)
state-of-the-art few-shot learning capability, and 3) state-of-the-art
fine-tuning capability. We argue that all existing PLM models lack one or more
properties from the Impossible Triangle. To remedy these missing properties of
PLMs, various techniques have been proposed, such as knowledge distillation,
data augmentation and prompt learning, which inevitably brings additional work
to the application of PLMs in real scenarios. We then offer insights into
future research directions of PLMs to achieve the Impossible Triangle, and
break down the task into several key phases
SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
Conversational question answering (CQA) is a novel QA task that requires
understanding of dialogue context. Different from traditional single-turn
machine reading comprehension (MRC) tasks, CQA includes passage comprehension,
coreference resolution, and contextual understanding. In this paper, we propose
an innovated contextualized attention-based deep neural network, SDNet, to fuse
context into traditional MRC models. Our model leverages both inter-attention
and self-attention to comprehend conversation context and extract relevant
information from passage. Furthermore, we demonstrated a novel method to
integrate the latest BERT contextual model. Empirical results show the
effectiveness of our model, which sets the new state of the art result in CoQA
leaderboard, outperforming the previous best model by 1.6% F1. Our ensemble
model further improves the result by 2.7% F1.Comment: 8 pages, 2 figure
SIM: A Slot-Independent Neural Model for Dialogue State Tracking
Dialogue state tracking is an important component in task-oriented dialogue
systems to identify users' goals and requests as a dialogue proceeds. However,
as most previous models are dependent on dialogue slots, the model complexity
soars when the number of slots increases. In this paper, we put forward a
slot-independent neural model (SIM) to track dialogue states while keeping the
model complexity invariant to the number of dialogue slots. The model utilizes
attention mechanisms between user utterance and system actions. SIM achieves
state-of-the-art results on WoZ and DSTC2 tasks, with only 20% of the model
size of previous models.Comment: 6 pages, 1 figur
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
Spoken language understanding (SLU) requires a model to analyze input
acoustic signal to understand its linguistic content and make predictions. To
boost the models' performance, various pre-training methods have been proposed
to learn rich representations from large-scale unannotated speech and text.
However, the inherent disparities between the two modalities necessitate a
mutual analysis. In this paper, we propose a novel semi-supervised learning
framework, SPLAT, to jointly pre-train the speech and language modules. Besides
conducting a self-supervised masked language modeling task on the two
individual modules using unpaired speech and text, SPLAT aligns representations
from the two modules in a shared latent space using a small amount of paired
speech and text. Thus, during fine-tuning, the speech module alone can produce
representations carrying both acoustic information and contextual semantic
knowledge of an input acoustic signal. Experimental results verify the
effectiveness of our approach on various SLU tasks. For example, SPLAT improves
the previous state-of-the-art performance on the Spoken SQuAD dataset by more
than 10%
- …