10,272 research outputs found
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods, which have tackled this
problem in a deterministic or non-parametric way, we propose a novel approach
that models future frames in a probabilistic manner. Our probabilistic model
makes it possible for us to sample and synthesize many possible future frames
from a single input image. Future frame synthesis is challenging, as it
involves low- and high-level image and motion understanding. We propose a novel
network structure, namely a Cross Convolutional Network to aid in synthesizing
future frames; this network structure encodes image and motion information as
feature maps and convolutional kernels, respectively. In experiments, our model
performs well on synthetic data, such as 2D shapes and animated game sprites,
as well as on real-wold videos. We also show that our model can be applied to
tasks such as visual analogy-making, and present an analysis of the learned
network representations.Comment: The first two authors contributed equally to this wor
GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks
One of the biggest issues facing the use of machine learning in medical
imaging is the lack of availability of large, labelled datasets. The annotation
of medical images is not only expensive and time consuming but also highly
dependent on the availability of expert observers. The limited amount of
training data can inhibit the performance of supervised machine learning
algorithms which often need very large quantities of data on which to train to
avoid overfitting. So far, much effort has been directed at extracting as much
information as possible from what data is available. Generative Adversarial
Networks (GANs) offer a novel way to unlock additional information from a
dataset by generating synthetic samples with the appearance of real images.
This paper demonstrates the feasibility of introducing GAN derived synthetic
data to the training datasets in two brain segmentation tasks, leading to
improvements in Dice Similarity Coefficient (DSC) of between 1 and 5 percentage
points under different conditions, with the strongest effects seen fewer than
ten training image stacks are available
Object Referring in Visual Scene with Spoken Language
Object referring has important applications, especially for human-machine
interaction. While having received great attention, the task is mainly attacked
with written language (text) as input rather than spoken language (speech),
which is more natural. This paper investigates Object Referring with Spoken
Language (ORSpoken) by presenting two datasets and one novel approach. Objects
are annotated with their locations in images, text descriptions and speech
descriptions. This makes the datasets ideal for multi-modality learning. The
approach is developed by carefully taking down ORSpoken problem into three
sub-problems and introducing task-specific vision-language interactions at the
corresponding levels. Experiments show that our method outperforms competing
methods consistently and significantly. The approach is also evaluated in the
presence of audio noise, showing the efficacy of the proposed vision-language
interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201
- …