2,439 research outputs found
Joint Maximum Purity Forest with Application to Image Super-Resolution
In this paper, we propose a novel random-forest scheme, namely Joint Maximum
Purity Forest (JMPF), for classification, clustering, and regression tasks. In
the JMPF scheme, the original feature space is transformed into a compactly
pre-clustered feature space, via a trained rotation matrix. The rotation matrix
is obtained through an iterative quantization process, where the input data
belonging to different classes are clustered to the respective vertices of the
new feature space with maximum purity. In the new feature space, orthogonal
hyperplanes, which are employed at the split-nodes of decision trees in random
forests, can tackle the clustering problems effectively. We evaluated our
proposed method on public benchmark datasets for regression and classification
tasks, and experiments showed that JMPF remarkably outperforms other
state-of-the-art random-forest-based approaches. Furthermore, we applied JMPF
to image super-resolution, because the transformed, compact features are more
discriminative to the clustering-regression scheme. Experiment results on
several public benchmark datasets also showed that the JMPF-based image
super-resolution scheme is consistently superior to recent state-of-the-art
image super-resolution algorithms.Comment: 18 pages, 7 figure
Multi-modal Face Pose Estimation with Multi-task Manifold Deep Learning
Human face pose estimation aims at estimating the gazing direction or head
postures with 2D images. It gives some very important information such as
communicative gestures, saliency detection and so on, which attracts plenty of
attention recently. However, it is challenging because of complex background,
various orientations and face appearance visibility. Therefore, a descriptive
representation of face images and mapping it to poses are critical. In this
paper, we make use of multi-modal data and propose a novel face pose estimation
method that uses a novel deep learning framework named Multi-task Manifold Deep
Learning . It is based on feature extraction with improved deep neural
networks and multi-modal mapping relationship with multi-task learning. In the
proposed deep learning based framework, Manifold Regularized Convolutional
Layers (MRCL) improve traditional convolutional layers by learning the
relationship among outputs of neurons. Besides, in the proposed mapping
relationship learning method, different modals of face representations are
naturally combined to learn the mapping function from face images to poses. In
this way, the computed mapping model with multiple tasks is improved.
Experimental results on three challenging benchmark datasets DPOSE, HPID and
BKHPD demonstrate the outstanding performance of
Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation
Due to large variations in shape, appearance, and viewing conditions, object
recognition is a key precursory challenge in the fields of object manipulation
and robotic/AI visual reasoning in general. Recognizing object categories,
particular instances of objects and viewpoints/poses of objects are three
critical subproblems robots must solve in order to accurately grasp/manipulate
objects and reason about their environments. Multi-view images of the same
object lie on intrinsic low-dimensional manifolds in descriptor spaces (e.g.
visual/depth descriptor spaces). These object manifolds share the same topology
despite being geometrically different. Each object manifold can be represented
as a deformed version of a unified manifold. The object manifolds can thus be
parameterized by its homeomorphic mapping/reconstruction from the unified
manifold. In this work, we develop a novel framework to jointly solve the three
challenging recognition sub-problems, by explicitly modeling the deformations
of object manifolds and factorizing it in a view-invariant space for
recognition. We perform extensive experiments on several challenging datasets
and achieve state-of-the-art results
Globally Tuned Cascade Pose Regression via Back Propagation with Application in 2D Face Pose Estimation and Heart Segmentation in 3D CT Images
Recently, a successful pose estimation algorithm, called Cascade Pose
Regression (CPR), was proposed in the literature. Trained over Pose Index
Feature, CPR is a regressor ensemble that is similar to Boosting. In this paper
we show how CPR can be represented as a Neural Network. Specifically, we adopt
a Graph Transformer Network (GTN) representation and accordingly train CPR with
Back Propagation (BP) that permits globally tuning. In contrast, previous CPR
literature only took a layer wise training without any post fine tuning. We
empirically show that global training with BP outperforms layer-wise
(pre-)training. Our CPR-GTN adopts a Multi Layer Percetron as the regressor,
which utilized sparse connection to learn local image feature representation.
We tested the proposed CPR-GTN on 2D face pose estimation problem as in
previous CPR literature. Besides, we also investigated the possibility of
extending CPR-GTN to 3D pose estimation by doing experiments using 3D Computed
Tomography dataset for heart segmentation
Learning and Refining of Privileged Information-based RNNs for Action Recognition from Depth Sequences
Existing RNN-based approaches for action recognition from depth sequences
require either skeleton joints or hand-crafted depth features as inputs. An
end-to-end manner, mapping from raw depth maps to action classes, is
non-trivial to design due to the fact that: 1) single channel map lacks texture
thus weakens the discriminative power; 2) relatively small set of depth
training data. To address these challenges, we propose to learn an RNN driven
by privileged information (PI) in three-steps: An encoder is pre-trained to
learn a joint embedding of depth appearance and PI (i.e. skeleton joints). The
learned embedding layers are then tuned in the learning step, aiming to
optimize the network by exploiting PI in a form of multi-task loss. However,
exploiting PI as a secondary task provides little help to improve the
performance of a primary task (i.e. classification) due to the gap between
them. Finally, a bridging matrix is defined to connect two tasks by discovering
latent PI in the refining step. Our PI-based classification loss maintains a
consistency between latent PI and predicted distribution. The latent PI and
network are iteratively estimated and updated in an expectation-maximization
procedure. The proposed learning process provides greater discriminative power
to model subtle depth difference, while helping avoid overfitting the scarcer
training data. Our experiments show significant performance gains over
state-of-the-art methods on three public benchmark datasets and our newly
collected Blanket dataset.Comment: conference cvpr 201
ASIST: Automatic Semantically Invariant Scene Transformation
We present ASIST, a technique for transforming point clouds by replacing
objects with their semantically equivalent counterparts. Transformations of
this kind have applications in virtual reality, repair of fused scans, and
robotics. ASIST is based on a unified formulation of semantic labeling and
object replacement; both result from minimizing a single objective. We present
numerical tools for the efficient solution of this optimization problem. The
method is experimentally assessed on new datasets of both synthetic and real
point clouds, and is additionally compared to two recent works on object
replacement on data from the corresponding papers
Robust Registration and Geometry Estimation from Unstructured Facial Scans
Commercial off the shelf (COTS) 3D scanners are capable of generating point
clouds covering visible portions of a face with sub-millimeter accuracy at
close range, but lack the coverage and specialized anatomic registration
provided by more expensive 3D facial scanners. We demonstrate an effective
pipeline for joint alignment of multiple unstructured 3D point clouds and
registration to a parameterized 3D model which represents shape variation of
the human head. Most algorithms separate the problems of pose estimation and
mesh warping, however we propose a new iterative method where these steps are
interwoven. Error decreases with each iteration, showing the proposed approach
is effective in improving geometry and alignment. The approach described is
used to align the NDOff-2007 dataset, which contains 7,358 individual scans at
various poses of 396 subjects. The dataset has a number of full profile scans
which are correctly aligned and contribute directly to the associated mesh
geometry. The dataset in its raw form contains a significant number of
mislabeled scans, which are identified and corrected based on alignment error
using the proposed algorithm. The average point to surface distance between the
aligned scans and the produced geometries is one half millimeter
Learning Local RGB-to-CAD Correspondences for Object Pose Estimation
We consider the problem of 3D object pose estimation. While much recent work
has focused on the RGB domain, the reliance on accurately annotated images
limits their generalizability and scalability. On the other hand, the easily
available CAD models of objects are rich sources of data, providing a large
number of synthetically rendered images. In this paper, we solve this key
problem of existing methods requiring expensive 3D pose annotations by
proposing a new method that matches RGB images to CAD models for object pose
estimation. Our key innovations compared to existing work include removing the
need for either real-world textures for CAD models or explicit 3D pose
annotations for RGB images. We achieve this through a series of objectives that
learn how to select keypoints and enforce viewpoint and modality invariance
across RGB images and CAD model renderings. We conduct extensive experiments to
demonstrate that the proposed method can reliably estimate object pose in RGB
images, as well as generalize to object instances not seen during training.Comment: 10 pages, 6 figures, 4 tables, ICCV 201
Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning
This paper presents KeypointNet, an end-to-end geometric reasoning framework
to learn an optimal set of category-specific 3D keypoints, along with their
detectors. Given a single image, KeypointNet extracts 3D keypoints that are
optimized for a downstream task. We demonstrate this framework on 3D pose
estimation by proposing a differentiable objective that seeks the optimal set
of keypoints for recovering the relative pose between two views of an object.
Our model discovers geometrically and semantically consistent keypoints across
viewing angles and instances of an object category. Importantly, we find that
our end-to-end framework using no ground-truth keypoint annotations outperforms
a fully supervised baseline using the same neural network architecture on the
task of pose estimation. The discovered 3D keypoints on the car, chair, and
plane categories of ShapeNet are visualized at http://keypointnet.github.io/
2D-3D Pose Consistency-based Conditional Random Fields for 3D Human Pose Estimation
This study considers the 3D human pose estimation problem in a single RGB
image by proposing a conditional random field (CRF) model over 2D poses, in
which the 3D pose is obtained as a byproduct of the inference process. The
unary term of the proposed CRF model is defined based on a powerful heat-map
regression network, which has been proposed for 2D human pose estimation. This
study also presents a regression network for lifting the 2D pose to 3D pose and
proposes the prior term based on the consistency between the estimated 3D pose
and the 2D pose. To obtain the approximate solution of the proposed CRF model,
the N-best strategy is adopted. The proposed inference algorithm can be viewed
as sequential processes of bottom-up generation of 2D and 3D pose proposals
from the input 2D image based on deep networks and top-down verification of
such proposals by checking their consistencies. To evaluate the proposed
method, we use two large-scale datasets: Human3.6M and HumanEva. Experimental
results show that the proposed method achieves the state-of-the-art 3D human
pose estimation performance
- …