786 research outputs found
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Most of the existing deep learning-based methods for 3D hand and human pose
estimation from a single depth map are based on a common framework that takes a
2D depth map and directly regresses the 3D coordinates of keypoints, such as
hand or human body joints, via 2D convolutional neural networks (CNNs). The
first weakness of this approach is the presence of perspective distortion in
the 2D depth map. While the depth map is intrinsically 3D data, many previous
methods treat depth maps as 2D images that can distort the shape of the actual
object through projection from 3D to 2D space. This compels the network to
perform perspective distortion-invariant estimation. The second weakness of the
conventional approach is that directly regressing 3D coordinates from a 2D
image is a highly non-linear mapping, which causes difficulty in the learning
procedure. To overcome these weaknesses, we firstly cast the 3D hand and human
pose estimation problem from a single depth map into a voxel-to-voxel
prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood
for each keypoint. We design our model as a 3D CNN that provides accurate
estimates while running in real-time. Our system outperforms previous methods
in almost all publicly available 3D hand and human pose estimation datasets and
placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge.
The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.Comment: HANDS 2017 Challenge Frame-based 3D Hand Pose Estimation Winner (ICCV
2017), Published at CVPR 201
Reflectance Hashing for Material Recognition
We introduce a novel method for using reflectance to identify materials.
Reflectance offers a unique signature of the material but is challenging to
measure and use for recognizing materials due to its high-dimensionality. In
this work, one-shot reflectance is captured using a unique optical camera
measuring {\it reflectance disks} where the pixel coordinates correspond to
surface viewing angles. The reflectance has class-specific stucture and angular
gradients computed in this reflectance space reveal the material class.
These reflectance disks encode discriminative information for efficient and
accurate material recognition. We introduce a framework called reflectance
hashing that models the reflectance disks with dictionary learning and binary
hashing. We demonstrate the effectiveness of reflectance hashing for material
recognition with a number of real-world materials
Towards Realistic Facial Expression Recognition
Automatic facial expression recognition has attracted significant attention over the past decades. Although substantial progress has been achieved for certain scenarios (such as frontal faces in strictly controlled laboratory settings), accurate recognition of facial expression in realistic environments remains unsolved for the most part. The main objective of this thesis is to investigate facial expression recognition in unconstrained environments. As one major problem faced by the literature is the lack of realistic training and testing data, this thesis presents a web search based framework to collect realistic facial expression dataset from the Web. By adopting an active learning based method to remove noisy images from text based image search results, the proposed approach minimizes the human efforts during the dataset construction and maximizes the scalability for future research. Various novel facial expression features are then proposed to address the challenges imposed by the newly collected dataset. Finally, a spectral embedding based feature fusion framework is presented to combine the proposed facial expression features to form a more descriptive representation. This thesis also systematically investigates how the number of frames of a facial expression sequence can affect the performance of facial expression recognition algorithms, since facial expression sequences may be captured under different frame rates in realistic scenarios. A facial expression keyframe selection method is proposed based on keypoint based frame representation. Comprehensive experiments have been performed to demonstrate the effectiveness of the presented methods
- …