22,677 research outputs found
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks
Taking a photo outside, can we predict the immediate future, e.g., how would
the cloud move in the sky? We address this problem by presenting a generative
adversarial network (GAN) based two-stage approach to generating realistic
time-lapse videos of high resolution. Given the first frame, our model learns
to generate long-term future frames. The first stage generates videos of
realistic contents for each frame. The second stage refines the generated video
from the first stage by enforcing it to be closer to real videos with regard to
motion dynamics. To further encourage vivid motion in the final generated
video, Gram matrix is employed to model the motion more precisely. We build a
large scale time-lapse dataset, and test our approach on this new dataset.
Using our model, we are able to generate realistic videos of up to resolution for 32 frames. Quantitative and qualitative experiment results
have demonstrated the superiority of our model over the state-of-the-art
models.Comment: To appear in Proceedings of CVPR 201
Clothing Co-Parsing by Joint Image Segmentation and Labeling
This paper aims at developing an integrated system of clothing co-parsing, in
order to jointly parse a set of clothing images (unsegmented but annotated with
tags) into semantic configurations. We propose a data-driven framework
consisting of two phases of inference. The first phase, referred as "image
co-segmentation", iterates to extract consistent regions on images and jointly
refines the regions over all images by employing the exemplar-SVM (E-SVM)
technique [23]. In the second phase (i.e. "region co-labeling"), we construct a
multi-image graphical model by taking the segmented regions as vertices, and
incorporate several contexts of clothing configuration (e.g., item location and
mutual interactions). The joint label assignment can be solved using the
efficient Graph Cuts algorithm. In addition to evaluate our framework on the
Fashionista dataset [30], we construct a dataset called CCP consisting of 2098
high-resolution street fashion photos to demonstrate the performance of our
system. We achieve 90.29% / 88.23% segmentation accuracy and 65.52% / 63.89%
recognition rate on the Fashionista and the CCP datasets, respectively, which
are superior compared with state-of-the-art methods.Comment: 8 pages, 5 figures, CVPR 201
Spatio-temporal Video Re-localization by Warp LSTM
The need for efficiently finding the video content a user wants is increasing
because of the erupting of user-generated videos on the Web. Existing
keyword-based or content-based video retrieval methods usually determine what
occurs in a video but not when and where. In this paper, we make an answer to
the question of when and where by formulating a new task, namely
spatio-temporal video re-localization. Specifically, given a query video and a
reference video, spatio-temporal video re-localization aims to localize
tubelets in the reference video such that the tubelets semantically correspond
to the query. To accurately localize the desired tubelets in the reference
video, we propose a novel warp LSTM network, which propagates the
spatio-temporal information for a long period and thereby captures the
corresponding long-term dependencies. Another issue for spatio-temporal video
re-localization is the lack of properly labeled video datasets. Therefore, we
reorganize the videos in the AVA dataset to form a new dataset for
spatio-temporal video re-localization research. Extensive experimental results
show that the proposed model achieves superior performances over the designed
baselines on the spatio-temporal video re-localization task
Hete-CF : Social-Based Collaborative Filtering Recommendation using Heterogeneous Relations
The work described here was funded by the National Natural Science Foundation of China (NSFC) under Grant No. 61373051; the National Science and Technology Pillar Program (Grant No.2013BAH07F05), the Key Laboratory for Symbolic Computation and Knowledge Engineering, Ministry of Education, China, and the UK Economic & Social Research Council (ESRC); award reference: ES/M001628/1.Preprin
A Novel Self-Intersection Penalty Term for Statistical Body Shape Models and Its Applications in 3D Pose Estimation
Statistical body shape models are widely used in 3D pose estimation due to
their low-dimensional parameters representation. However, it is difficult to
avoid self-intersection between body parts accurately. Motivated by this fact,
we proposed a novel self-intersection penalty term for statistical body shape
models applied in 3D pose estimation. To avoid the trouble of computing
self-intersection for complex surfaces like the body meshes, the gradient of
our proposed self-intersection penalty term is manually derived from the
perspective of geometry. First, the self-intersection penalty term is defined
as the volume of the self-intersection region. To calculate the partial
derivatives with respect to the coordinates of the vertices, we employed
detection rays to divide vertices of statistical body shape models into
different groups depending on whether the vertex is in the region of
self-intersection. Second, the partial derivatives could be easily derived by
the normal vectors of neighboring triangles of the vertices. Finally, this
penalty term could be applied in gradient-based optimization algorithms to
remove the self-intersection of triangular meshes without using any
approximation. Qualitative and quantitative evaluations were conducted to
demonstrate the effectiveness and generality of our proposed method compared
with previous approaches. The experimental results show that our proposed
penalty term can avoid self-intersection to exclude unreasonable predictions
and improves the accuracy of 3D pose estimation indirectly. Further more, the
proposed method could be employed universally in triangular mesh based 3D
reconstruction
- …
