18,634 research outputs found
Learning to Transform Time Series with a Few Examples
We describe a semi-supervised regression algorithm that learns to transform one time series into another time series given examples of the transformation. This algorithm is applied to tracking, where a time series of observations from sensors is transformed to a time series describing the pose of a target. Instead of defining and implementing such transformations for each tracking task separately, our algorithm learns a memoryless transformation of time series from a few example input-output mappings. The algorithm searches for a smooth function that fits the training examples and, when applied to the input time series, produces a time series that evolves according to assumed dynamics. The learning procedure is fast and lends itself to a closed-form solution. It is closely related to nonlinear system identification and manifold learning techniques. We demonstrate our algorithm on the tasks of tracking RFID tags from signal strength measurements, recovering the pose of rigid objects, deformable bodies, and articulated bodies from video sequences. For these tasks, this algorithm requires significantly fewer examples compared to fully-supervised regression algorithms or semi-supervised learning algorithms that do not take the dynamics of the output time series into account
Learning 3D Human Pose from Structure and Motion
3D human pose estimation from a single image is a challenging problem,
especially for in-the-wild settings due to the lack of 3D annotated data. We
propose two anatomically inspired loss functions and use them with a
weakly-supervised learning framework to jointly learn from large-scale
in-the-wild 2D and indoor/synthetic 3D data. We also present a simple temporal
network that exploits temporal and structural cues present in predicted pose
sequences to temporally harmonize the pose estimations. We carefully analyze
the proposed contributions through loss surface visualizations and sensitivity
analysis to facilitate deeper understanding of their working mechanism. Our
complete pipeline improves the state-of-the-art by 11.8% and 12% on Human3.6M
and MPI-INF-3DHP, respectively, and runs at 30 FPS on a commodity graphics
card.Comment: ECCV 2018. Project page: https://www.cse.iitb.ac.in/~rdabral/3DPose
VIBE: Video Inference for Human Body Pose and Shape Estimation
Human motion is fundamental to understanding behavior. Despite progress on
single-image 3D pose and shape estimation, existing video-based
state-of-the-art methods fail to produce accurate and natural motion sequences
due to a lack of ground-truth 3D motion data for training. To address this
problem, we propose Video Inference for Body Pose and Shape Estimation (VIBE),
which makes use of an existing large-scale motion capture dataset (AMASS)
together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty
is an adversarial learning framework that leverages AMASS to discriminate
between real human motions and those produced by our temporal pose and shape
regression networks. We define a temporal network architecture and show that
adversarial training, at the sequence level, produces kinematically plausible
motion sequences without in-the-wild ground-truth 3D labels. We perform
extensive experimentation to analyze the importance of motion and demonstrate
the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving
state-of-the-art performance. Code and pretrained models are available at
https://github.com/mkocabas/VIBE.Comment: CVPR-2020 camera ready. Code is available at
https://github.com/mkocabas/VIB
Geometry-Aware Learning of Maps for Camera Localization
Maps are a key component in image-based camera localization and visual SLAM
systems: they are used to establish geometric constraints between images,
correct drift in relative pose estimation, and relocalize cameras after lost
tracking. The exact definitions of maps, however, are often
application-specific and hand-crafted for different scenarios (e.g. 3D
landmarks, lines, planes, bags of visual words). We propose to represent maps
as a deep neural net called MapNet, which enables learning a data-driven map
representation. Unlike prior work on learning maps, MapNet exploits cheap and
ubiquitous sensory inputs like visual odometry and GPS in addition to images
and fuses them together for camera localization. Geometric constraints
expressed by these inputs, which have traditionally been used in bundle
adjustment or pose-graph optimization, are formulated as loss terms in MapNet
training and also used during inference. In addition to directly improving
localization accuracy, this allows us to update the MapNet (i.e., maps) in a
self-supervised manner using additional unlabeled video sequences from the
scene. We also propose a novel parameterization for camera rotation which is
better suited for deep-learning based camera pose regression. Experimental
results on both the indoor 7-Scenes dataset and the outdoor Oxford RobotCar
dataset show significant performance improvement over prior work. The MapNet
project webpage is https://goo.gl/mRB3Au.Comment: CVPR 2018 camera ready paper + supplementary materia
- …