478,157 research outputs found
Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems
This paper presents the Frames dataset (Frames is available at
http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues
with an average of 15 turns per dialogue. We developed this dataset to study
the role of memory in goal-oriented dialogue systems. Based on Frames, we
introduce a task called frame tracking, which extends state tracking to a
setting where several states are tracked simultaneously. We propose a baseline
model for this task. We show that Frames can also be used to study memory in
dialogue management and information presentation through natural language
generation
Numerically erasure-robust frames
Given a channel with additive noise and adversarial erasures, the task is to
design a frame that allows for stable signal reconstruction from transmitted
frame coefficients. To meet these specifications, we introduce numerically
erasure-robust frames. We first consider a variety of constructions, including
random frames, equiangular tight frames and group frames. Later, we show that
arbitrarily large erasure rates necessarily induce numerical instability in
signal reconstruction. We conclude with a few observations, including some
implications for maximal equiangular tight frames and sparse frames.Comment: 15 page
Stochastic Dynamics for Video Infilling
In this paper, we introduce a stochastic dynamics video infilling (SDVI)
framework to generate frames between long intervals in a video. Our task
differs from video interpolation which aims to produce transitional frames for
a short interval between every two frames and increase the temporal resolution.
Our task, namely video infilling, however, aims to infill long intervals with
plausible frame sequences. Our framework models the infilling as a constrained
stochastic generation process and sequentially samples dynamics from the
inferred distribution. SDVI consists of two parts: (1) a bi-directional
constraint propagation module to guarantee the spatial-temporal coherence among
frames, (2) a stochastic sampling process to generate dynamics from the
inferred distributions. Experimental results show that SDVI can generate clear
frame sequences with varying contents. Moreover, motions in the generated
sequence are realistic and able to transfer smoothly from the given start frame
to the terminal frame. Our project site is
https://xharlie.github.io/projects/project_sites/SDVI/video_results.htmlComment: Winter Conference on Applications of Computer Vision (WACV 2020
A Frame Tracking Model for Memory-Enhanced Dialogue Systems
Recently, resources and tasks were proposed to go beyond state tracking in
dialogue systems. An example is the frame tracking task, which requires
recording multiple frames, one for each user goal set during the dialogue. This
allows a user, for instance, to compare items corresponding to different goals.
This paper proposes a model which takes as input the list of frames created so
far during the dialogue, the current user utterance as well as the dialogue
acts, slot types, and slot values associated with this utterance. The model
then outputs the frame being referenced by each triple of dialogue act, slot
type, and slot value. We show that on the recently published Frames dataset,
this model significantly outperforms a previously proposed rule-based baseline.
In addition, we propose an extensive analysis of the frame tracking task by
dividing it into sub-tasks and assessing their difficulty with respect to our
model
Predicting Deeper into the Future of Semantic Segmentation
The ability to predict and therefore to anticipate the future is an important
attribute of intelligence. It is also of utmost importance in real-time
systems, e.g. in robotics or autonomous driving, which depend on visual scene
understanding for decision making. While prediction of the raw RGB pixel values
in future video frames has been studied in previous work, here we introduce the
novel task of predicting semantic segmentations of future frames. Given a
sequence of video frames, our goal is to predict segmentation maps of not yet
observed video frames that lie up to a second or further in the future. We
develop an autoregressive convolutional neural network that learns to
iteratively generate multiple frames. Our results on the Cityscapes dataset
show that directly predicting future segmentations is substantially better than
predicting and then segmenting future RGB frames. Prediction results up to half
a second in the future are visually convincing and are much more accurate than
those of a baseline based on warping semantic segmentations using optical flow.Comment: Accepted to ICCV 2017. Supplementary material available on the
authors' webpage
- …