12,016 research outputs found
Skeleton-aided Articulated Motion Generation
This work make the first attempt to generate articulated human motion
sequence from a single image. On the one hand, we utilize paired inputs
including human skeleton information as motion embedding and a single human
image as appearance reference, to generate novel motion frames, based on the
conditional GAN infrastructure. On the other hand, a triplet loss is employed
to pursue appearance-smoothness between consecutive frames. As the proposed
framework is capable of jointly exploiting the image appearance space and
articulated/kinematic motion space, it generates realistic articulated motion
sequence, in contrast to most previous video generation methods which yield
blurred motion effects. We test our model on two human action datasets
including KTH and Human3.6M, and the proposed framework generates very
promising results on both datasets.Comment: ACM MM 201
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
Unsupervised Learning of Depth and Ego-Motion from Video
We present an unsupervised learning framework for the task of monocular depth
and camera motion estimation from unstructured video sequences. We achieve this
by simultaneously training depth and camera pose estimation networks using the
task of view synthesis as the supervisory signal. The networks are thus coupled
via the view synthesis objective during training, but can be applied
independently at test time. Empirical evaluation on the KITTI dataset
demonstrates the effectiveness of our approach: 1) monocular depth performing
comparably with supervised methods that use either ground-truth pose or depth
for training, and 2) pose estimation performing favorably with established SLAM
systems under comparable input settings.Comment: Accepted to CVPR 2017. Project webpage:
https://people.eecs.berkeley.edu/~tinghuiz/projects/SfMLearner
Unobtrusive and pervasive video-based eye-gaze tracking
Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe
Modeling Brain Circuitry over a Wide Range of Scales
If we are ever to unravel the mysteries of brain function at its most
fundamental level, we will need a precise understanding of how its component
neurons connect to each other. Electron Microscopes (EM) can now provide the
nanometer resolution that is needed to image synapses, and therefore
connections, while Light Microscopes (LM) see at the micrometer resolution
required to model the 3D structure of the dendritic network. Since both the
topology and the connection strength are integral parts of the brain's wiring
diagram, being able to combine these two modalities is critically important.
In fact, these microscopes now routinely produce high-resolution imagery in
such large quantities that the bottleneck becomes automated processing and
interpretation, which is needed for such data to be exploited to its full
potential. In this paper, we briefly review the Computer Vision techniques we
have developed at EPFL to address this need. They include delineating dendritic
arbors from LM imagery, segmenting organelles from EM, and combining the two
into a consistent representation
- …