3,716 research outputs found
Flowing ConvNets for Human Pose Estimation in Videos
The objective of this work is human pose estimation in videos, where multiple
frames are available. We investigate a ConvNet architecture that is able to
benefit from temporal context by combining information across the multiple
frames using optical flow.
To this end we propose a network architecture with the following novelties:
(i) a deeper network than previously investigated for regressing heatmaps; (ii)
spatial fusion layers that learn an implicit spatial model; (iii) optical flow
is used to align heatmap predictions from neighbouring frames; and (iv) a final
parametric pooling layer which learns to combine the aligned heatmaps into a
pooled confidence map.
We show that this architecture outperforms a number of others, including one
that uses optical flow solely at the input layers, one that regresses joint
coordinates directly, and one that predicts heatmaps without spatial fusion.
The new architecture outperforms the state of the art by a large margin on
three video pose estimation datasets, including the very challenging Poses in
the Wild dataset, and outperforms other deep methods that don't use a graphical
model on the single-image FLIC benchmark (and also Chen & Yuille and Tompson et
al. in the high precision region).Comment: ICCV'1
Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis
Factor analysis aims to determine latent factors, or traits, which summarize
a given data set. Inter-battery factor analysis extends this notion to multiple
views of the data. In this paper we show how a nonlinear, nonparametric version
of these models can be recovered through the Gaussian process latent variable
model. This gives us a flexible formalism for multi-view learning where the
latent variables can be used both for exploratory purposes and for learning
representations that enable efficient inference for ambiguous estimation tasks.
Learning is performed in a Bayesian manner through the formulation of a
variational compression scheme which gives a rigorous lower bound on the log
likelihood. Our Bayesian framework provides strong regularization during
training, allowing the structure of the latent space to be determined
efficiently and automatically. We demonstrate this by producing the first (to
our knowledge) published results of learning from dozens of views, even when
data is scarce. We further show experimental results on several different types
of multi-view data sets and for different kinds of tasks, including exploratory
data analysis, generation, ambiguity modelling through latent priors and
classification.Comment: 49 pages including appendi
Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on n-Spheres
Many computer vision challenges require continuous outputs, but tend to be
solved by discrete classification. The reason is classification's natural
containment within a probability -simplex, as defined by the popular softmax
activation function. Regular regression lacks such a closed geometry, leading
to unstable training and convergence to suboptimal local minima. Starting from
this insight we revisit regression in convolutional neural networks. We observe
many continuous output problems in computer vision are naturally contained in
closed geometrical manifolds, like the Euler angles in viewpoint estimation or
the normals in surface normal estimation. A natural framework for posing such
continuous output problems are -spheres, which are naturally closed
geometric manifolds defined in the space. By introducing a
spherical exponential mapping on -spheres at the regression output, we
obtain well-behaved gradients, leading to stable training. We show how our
spherical regression can be utilized for several computer vision challenges,
specifically viewpoint estimation, surface normal estimation and 3D rotation
estimation. For all these problems our experiments demonstrate the benefit of
spherical regression. All paper resources are available at
https://github.com/leoshine/Spherical_Regression.Comment: CVPR 2019 camera read
- …