4,770 research outputs found
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods that have tackled this
problem in a deterministic or non-parametric way, we propose to model future
frames in a probabilistic manner. Our probabilistic model makes it possible for
us to sample and synthesize many possible future frames from a single input
image. To synthesize realistic movement of objects, we propose a novel network
structure, namely a Cross Convolutional Network; this network encodes image and
motion information as feature maps and convolutional kernels, respectively. In
experiments, our model performs well on synthetic data, such as 2D shapes and
animated game sprites, and on real-world video frames. We present analyses of
the learned network representations, showing it is implicitly learning a
compact encoding of object appearance and motion. We also demonstrate a few of
its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first
two authors contributed equally to this work. Project page:
http://visualdynamics.csail.mit.ed
Twofold Video Hashing with Automatic Synchronization
Video hashing finds a wide array of applications in content authentication,
robust retrieval and anti-piracy search. While much of the existing research
has focused on extracting robust and secure content descriptors, a significant
open challenge still remains: Most existing video hashing methods are fallible
to temporal desynchronization. That is, when the query video results by
deleting or inserting some frames from the reference video, most existing
methods assume the positions of the deleted (or inserted) frames are either
perfectly known or reliably estimated. This assumption may be okay under
typical transcoding and frame-rate changes but is highly inappropriate in
adversarial scenarios such as anti-piracy video search. For example, an illegal
uploader will try to bypass the 'piracy check' mechanism of YouTube/Dailymotion
etc by performing a cleverly designed non-uniform resampling of the video. We
present a new solution based on dynamic time warping (DTW), which can implement
automatic synchronization and can be used together with existing video hashing
methods. The second contribution of this paper is to propose a new robust
feature extraction method called flow hashing (FH), based on frame averaging
and optical flow descriptors. Finally, a fusion mechanism called distance
boosting is proposed to combine the information extracted by DTW and FH.
Experiments on real video collections show that such a hash extraction and
comparison enables unprecedented robustness under both spatial and temporal
attacks.Comment: submitted to Image Processing (ICIP), 2014 21st IEEE International
Conference o
- …