3,712 research outputs found
SegICP: Integrated Deep Semantic Segmentation and Pose Estimation
Recent robotic manipulation competitions have highlighted that sophisticated
robots still struggle to achieve fast and reliable perception of task-relevant
objects in complex, realistic scenarios. To improve these systems' perceptive
speed and robustness, we present SegICP, a novel integrated solution to object
recognition and pose estimation. SegICP couples convolutional neural networks
and multi-hypothesis point cloud registration to achieve both robust pixel-wise
semantic segmentation as well as accurate and real-time 6-DOF pose estimation
for relevant objects. Our architecture achieves 1cm position error and
<5^\circ$ angle error in real time without an initial seed. We evaluate and
benchmark SegICP against an annotated dataset generated by motion capture.Comment: IROS camera-read
ALCN: Meta-Learning for Contrast Normalization Applied to Robust 3D Pose Estimation
To be robust to illumination changes when detecting objects in images, the
current trend is to train a Deep Network with training images captured under
many different lighting conditions. Unfortunately, creating such a training set
is very cumbersome, or sometimes even impossible, for some applications such as
3D pose estimation of specific objects, which is the application we focus on in
this paper. We therefore propose a novel illumination normalization method that
lets us learn to detect objects and estimate their 3D pose under challenging
illumination conditions from very few training samples. Our key insight is that
normalization parameters should adapt to the input image. In particular, we
realized this via a Convolutional Neural Network trained to predict the
parameters of a generalization of the Difference-of-Gaussians method. We show
that our method significantly outperforms standard normalization methods and
demonstrate it on two challenging 3D detection and pose estimation problems.Comment: BMVC' 1
A Survey of the Trends in Facial and Expression Recognition Databases and Methods
Automated facial identification and facial expression recognition have been
topics of active research over the past few decades. Facial and expression
recognition find applications in human-computer interfaces, subject tracking,
real-time security surveillance systems and social networking. Several holistic
and geometric methods have been developed to identify faces and expressions
using public and local facial image databases. In this work we present the
evolution in facial image data sets and the methodologies for facial
identification and recognition of expressions such as anger, sadness,
happiness, disgust, fear and surprise. We observe that most of the earlier
methods for facial and expression recognition aimed at improving the
recognition rates for facial feature-based methods using static images.
However, the recent methodologies have shifted focus towards robust
implementation of facial/expression recognition from large image databases that
vary with space (gathered from the internet) and time (video recordings). The
evolution trends in databases and methodologies for facial and expression
recognition can be useful for assessing the next-generation topics that may
have applications in security systems or personal identification systems that
involve "Quantitative face" assessments.Comment: 16 pages, 4 figures, 3 tables, International Journal of Computer
Science and Engineering Survey, October, 201
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Can Synthetic Faces Undo the Damage of Dataset Bias to Face Recognition and Facial Landmark Detection?
It is well known that deep learning approaches to face recognition and facial
landmark detection suffer from biases in modern training datasets. In this
work, we propose to use synthetic face images to reduce the negative effects of
dataset biases on these tasks. Using a 3D morphable face model, we generate
large amounts of synthetic face images with full control over facial shape and
color, pose, illumination, and background. With a series of experiments, we
extensively test the effects of priming deep nets by pre-training them with
synthetic faces. We observe the following positive effects for face recognition
and facial landmark detection tasks: 1) Priming with synthetic face images
improves the performance consistently across all benchmarks because it reduces
the negative effects of biases in the training data. 2) Traditional approaches
for reducing the damage of dataset bias, such as data augmentation and transfer
learning, are less effective than training with synthetic faces. 3) Using
synthetic data, we can reduce the size of real-world datasets by 75% for face
recognition and by 50% for facial landmark detection while maintaining
performance. Thus, offering a means to focus the data collection process on
less but higher quality data.Comment: Technical repor
Fine-Grained Head Pose Estimation Without Keypoints
Estimating the head pose of a person is a crucial problem that has a large
amount of applications such as aiding in gaze estimation, modeling attention,
fitting 3D models to video and performing face alignment. Traditionally head
pose is computed by estimating some keypoints from the target face and solving
the 2D to 3D correspondence problem with a mean human head model. We argue that
this is a fragile method because it relies entirely on landmark detection
performance, the extraneous head model and an ad-hoc fitting step. We present
an elegant and robust way to determine pose by training a multi-loss
convolutional neural network on 300W-LP, a large synthetically expanded
dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from
image intensities through joint binned pose classification and regression. We
present empirical tests on common in-the-wild pose benchmark datasets which
show state-of-the-art results. Additionally we test our method on a dataset
usually used for pose estimation using depth and start to close the gap with
state-of-the-art depth pose methods. We open-source our training and testing
code as well as release our pre-trained models.Comment: Accepted to Computer Vision and Pattern Recognition Workshops
(CVPRW), 2018 IEEE Conference on. IEEE, 201
Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics
Training data is the key ingredient for deep learning approaches, but
difficult to obtain for the specialized domains often encountered in robotics.
We describe a synthesis pipeline capable of producing training data for
cluttered scene perception tasks such as semantic segmentation, object
detection, and correspondence or pose estimation. Our approach arranges object
meshes in physically realistic, dense scenes using physics simulation. The
arranged scenes are rendered using high-quality rasterization with randomized
appearance and material parameters. Noise and other transformations introduced
by the camera sensors are simulated. Our pipeline can be run online during
training of a deep neural network, yielding applications in life-long learning
and in iterative render-and-compare approaches. We demonstrate the usability by
learning semantic segmentation on the challenging YCB-Video dataset without
actually using any training frames, where our method achieves performance
comparable to a conventionally trained model. Additionally, we show successful
application in a real-world regrasping system.Comment: Accepted for ICRA 202
Self-Supervised Learning of Depth and Camera Motion from 360{\deg} Videos
As 360{\deg} cameras become prevalent in many autonomous systems (e.g.,
self-driving cars and drones), efficient 360{\deg} perception becomes more and
more important. We propose a novel self-supervised learning approach for
predicting the omnidirectional depth and camera motion from a 360{\deg} video.
In particular, starting from the SfMLearner, which is designed for cameras with
normal field-of-view, we introduce three key features to process 360{\deg}
images efficiently. Firstly, we convert each image from equirectangular
projection to cubic projection in order to avoid image distortion. In each
network layer, we use Cube Padding (CP), which pads intermediate features from
adjacent faces, to avoid image boundaries. Secondly, we propose a novel
"spherical" photometric consistency constraint on the whole viewing sphere. In
this way, no pixel will be projected outside the image boundary which typically
happens in images with normal field-of-view. Finally, rather than naively
estimating six independent camera motions (i.e., naively applying SfM-Learner
to each face on a cube), we propose a novel camera pose consistency loss to
ensure the estimated camera motions reaching consensus. To train and evaluate
our approach, we collect a new PanoSUNCG dataset containing a large amount of
360{\deg} videos with groundtruth depth and camera motion. Our approach
achieves state-of-the-art depth prediction and camera motion estimation on
PanoSUNCG with faster inference speed comparing to equirectangular. In
real-world indoor videos, our approach can also achieve qualitatively
reasonable depth prediction by acquiring model pre-trained on PanoSUNCG.Comment: ACCV 2018 Ora
Convolutional Point-set Representation: A Convolutional Bridge Between a Densely Annotated Image and 3D Face Alignment
We present a robust method for estimating the facial pose and shape
information from a densely annotated facial image. The method relies on
Convolutional Point-set Representation (CPR), a carefully designed matrix
representation to summarize different layers of information encoded in the set
of detected points in the annotated image. The CPR disentangles the
dependencies of shape and different pose parameters and enables updating
different parameters in a sequential manner via convolutional neural networks
and recurrent layers. When updating the pose parameters, we sample reprojection
errors along with a predicted direction and update the parameters based on the
pattern of reprojection errors. This technique boosts the model's capability in
searching a local minimum under challenging scenarios. We also demonstrate that
annotation from different sources can be merged under the framework of CPR and
contributes to outperforming the current state-of-the-art solutions for 3D face
alignment. Experiments indicate the proposed CPRFA (CPR-based Face Alignment)
significantly improves 3D alignment accuracy when the densely annotated image
contains noise and missing values, which is common under "in-the-wild"
acquisition scenarios.Comment: Preprint Submitte
Facial 3D Model Registration Under Occlusions With SensiblePoints-based Reinforced Hypothesis Refinement
Registering a 3D facial model to a 2D image under occlusion is difficult.
First, not all of the detected facial landmarks are accurate under occlusions.
Second, the number of reliable landmarks may not be enough to constrain the
problem. We propose a method to synthesize additional points (SensiblePoints)
to create pose hypotheses. The visual clues extracted from the fiducial points,
non-fiducial points, and facial contour are jointly employed to verify the
hypotheses. We define a reward function to measure whether the projected dense
3D model is well-aligned with the confidence maps generated by two fully
convolutional networks, and use the function to train recurrent policy networks
to move the SensiblePoints. The same reward function is employed in testing to
select the best hypothesis from a candidate pool of hypotheses. Experimentation
demonstrates that the proposed approach is very promising in solving the facial
model registration problem under occlusion.Comment: Accepted in International Joint Conference on Biometrics (IJCB) 201
- …