23,724 research outputs found
Learning to Translate in Real-time with Neural Machine Translation
Translating in real-time, a.k.a. simultaneous translation, outputs
translation words before the input sentence ends, which is a challenging
problem for conventional machine translation methods. We propose a neural
machine translation (NMT) framework for simultaneous translation in which an
agent learns to make decisions on when to translate from the interaction with a
pre-trained NMT environment. To trade off quality and delay, we extensively
explore various targets for delay and design a method for beam-search
applicable in the simultaneous MT setting. Experiments against state-of-the-art
baselines on two language pairs demonstrate the efficacy of the proposed
framework both quantitatively and qualitatively.Comment: 10 pages, camera read
Backchannels: Quantity, Type and Timing Matters
In a perception experiment, we systematically varied the quantity, type and timing of backchannels. Participants viewed stimuli of a real speaker side-by-side with an animated listener and rated how human-like they perceived the latter's backchannel behavior. In addition, we obtained measures of appropriateness and optionality for each backchannel from key strokes. This approach allowed us to analyze the influence of each of the factors on entire fragments and on individual backchannels. The originally performed type and timing of a backchannel appeared to be more human-like, compared to a switched type or random timing. In addition, we found that nods are more often appropriate than vocalizations. For quantity, too few or too many backchannels per minute appeared to reduce the quality of the behavior. These findings are important for the design of algorithms for the automatic generation of backchannel behavior for artificial listeners
Predicting continuous conflict perception with Bayesian Gaussian processes
Conflict is one of the most important phenomena of social life, but it is still largely neglected by the computing community. This work proposes an approach
that detects common conversational social signals (loudness, overlapping speech,
etc.) and predicts the conflict level perceived by human observers in continuous,
non-categorical terms. The proposed regression approach is fully Bayesian and it
adopts Automatic Relevance Determination to identify the social signals that influence most the outcome of the prediction. The experiments are performed over the SSPNet Conflict Corpus, a publicly available collection of 1430 clips extracted from televised political debates (roughly 12 hours of material for 138 subjects in total). The results show that it is possible to achieve a correlation close to 0.8 between actual and predicted conflict perception
A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding
Zero-shot dialogue understanding aims to enable dialogue to track the user's
needs without any training data, which has gained increasing attention. In this
work, we investigate the understanding ability of ChatGPT for zero-shot
dialogue understanding tasks including spoken language understanding (SLU) and
dialogue state tracking (DST). Experimental results on four popular benchmarks
reveal the great potential of ChatGPT for zero-shot dialogue understanding. In
addition, extensive analysis shows that ChatGPT benefits from the multi-turn
interactive prompt in the DST task but struggles to perform slot filling for
SLU. Finally, we summarize several unexpected behaviors of ChatGPT in dialogue
understanding tasks, hoping to provide some insights for future research on
building zero-shot dialogue understanding systems with Large Language Models
(LLMs).Comment: Technical Repor
- …