29 research outputs found
Growing Regression Forests by Classification: Applications to Object Pose Estimation
In this work, we propose a novel node splitting method for regression trees
and incorporate it into the regression forest framework. Unlike traditional
binary splitting, where the splitting rule is selected from a predefined set of
binary splitting rules via trial-and-error, the proposed node splitting method
first finds clusters of the training data which at least locally minimize the
empirical loss without considering the input space. Then splitting rules which
preserve the found clusters as much as possible are determined by casting the
problem into a classification problem. Consequently, our new node splitting
method enjoys more freedom in choosing the splitting rules, resulting in more
efficient tree structures. In addition to the Euclidean target space, we
present a variant which can naturally deal with a circular target space by the
proper use of circular statistics. We apply the regression forest employing our
node splitting to head pose estimation (Euclidean target space) and car
direction estimation (circular target space) and demonstrate that the proposed
method significantly outperforms state-of-the-art methods (38.5% and 22.5%
error reduction respectively).Comment: Paper accepted by ECCV 201
Flowing ConvNets for Human Pose Estimation in Videos
The objective of this work is human pose estimation in videos, where multiple
frames are available. We investigate a ConvNet architecture that is able to
benefit from temporal context by combining information across the multiple
frames using optical flow.
To this end we propose a network architecture with the following novelties:
(i) a deeper network than previously investigated for regressing heatmaps; (ii)
spatial fusion layers that learn an implicit spatial model; (iii) optical flow
is used to align heatmap predictions from neighbouring frames; and (iv) a final
parametric pooling layer which learns to combine the aligned heatmaps into a
pooled confidence map.
We show that this architecture outperforms a number of others, including one
that uses optical flow solely at the input layers, one that regresses joint
coordinates directly, and one that predicts heatmaps without spatial fusion.
The new architecture outperforms the state of the art by a large margin on
three video pose estimation datasets, including the very challenging Poses in
the Wild dataset, and outperforms other deep methods that don't use a graphical
model on the single-image FLIC benchmark (and also Chen & Yuille and Tompson et
al. in the high precision region).Comment: ICCV'1
Deep Convolutional Neural Networks for Estimating Lens Distortion Parameters
In this paper we present a convolutional neural network (CNN) to predict multiple lens distortion parameters from a single input image. Unlike other methods, our network is suitable to create high resolution output as it directly estimates the parameters from the image which then can be used to rectify even very high resolution input images. As our method it is fully automatic, it is suitable for both casual creatives and professional artists. Our results show that our network accurately predicts the lens distortion parameters of high resolution images and corrects the distortions satisfactory
Hi-Fi: Hierarchical Feature Integration for Skeleton Detection
In natural images, the scales (thickness) of object skeletons may
dramatically vary among objects and object parts, making object skeleton
detection a challenging problem. We present a new convolutional neural network
(CNN) architecture by introducing a novel hierarchical feature integration
mechanism, named Hi-Fi, to address the skeleton detection problem. The proposed
CNN-based approach has a powerful multi-scale feature integration ability that
intrinsically captures high-level semantics from deeper layers as well as
low-level details from shallower layers. % By hierarchically integrating
different CNN feature levels with bidirectional guidance, our approach (1)
enables mutual refinement across features of different levels, and (2)
possesses the strong ability to capture both rich object context and
high-resolution details. Experimental results show that our method
significantly outperforms the state-of-the-art methods in terms of effectively
fusing features from very different scales, as evidenced by a considerable
performance improvement on several benchmarks.Comment: IJCAI201