48,725 research outputs found
A Novel Method for the Absolute Pose Problem with Pairwise Constraints
Absolute pose estimation is a fundamental problem in computer vision, and it
is a typical parameter estimation problem, meaning that efforts to solve it
will always suffer from outlier-contaminated data. Conventionally, for a fixed
dimensionality d and the number of measurements N, a robust estimation problem
cannot be solved faster than O(N^d). Furthermore, it is almost impossible to
remove d from the exponent of the runtime of a globally optimal algorithm.
However, absolute pose estimation is a geometric parameter estimation problem,
and thus has special constraints. In this paper, we consider pairwise
constraints and propose a globally optimal algorithm for solving the absolute
pose estimation problem. The proposed algorithm has a linear complexity in the
number of correspondences at a given outlier ratio. Concretely, we first
decouple the rotation and the translation subproblems by utilizing the pairwise
constraints, and then we solve the rotation subproblem using the
branch-and-bound algorithm. Lastly, we estimate the translation based on the
known rotation by using another branch-and-bound algorithm. The advantages of
our method are demonstrated via thorough testing on both synthetic and
real-world dataComment: 10 pages, 7figure
3D Bounding Box Estimation Using Deep Learning and Geometry
We present a method for 3D object detection and pose estimation from a single
image. In contrast to current techniques that only regress the 3D orientation
of an object, our method first regresses relatively stable 3D object properties
using a deep convolutional neural network and then combines these estimates
with geometric constraints provided by a 2D object bounding box to produce a
complete 3D bounding box. The first network output estimates the 3D object
orientation using a novel hybrid discrete-continuous loss, which significantly
outperforms the L2 loss. The second output regresses the 3D object dimensions,
which have relatively little variance compared to alternatives and can often be
predicted for many object types. These estimates, combined with the geometric
constraints on translation imposed by the 2D bounding box, enable us to recover
a stable and accurate 3D object pose. We evaluate our method on the challenging
KITTI object detection benchmark both on the official metric of 3D orientation
estimation and also on the accuracy of the obtained 3D bounding boxes. Although
conceptually simple, our method outperforms more complex and computationally
expensive approaches that leverage semantic segmentation, instance level
segmentation and flat ground priors and sub-category detection. Our
discrete-continuous loss also produces state of the art results for 3D
viewpoint estimation on the Pascal 3D+ dataset.Comment: To appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
Single camera pose estimation using Bayesian filtering and Kinect motion priors
Traditional approaches to upper body pose estimation using monocular vision
rely on complex body models and a large variety of geometric constraints. We
argue that this is not ideal and somewhat inelegant as it results in large
processing burdens, and instead attempt to incorporate these constraints
through priors obtained directly from training data. A prior distribution
covering the probability of a human pose occurring is used to incorporate
likely human poses. This distribution is obtained offline, by fitting a
Gaussian mixture model to a large dataset of recorded human body poses, tracked
using a Kinect sensor. We combine this prior information with a random walk
transition model to obtain an upper body model, suitable for use within a
recursive Bayesian filtering framework. Our model can be viewed as a mixture of
discrete Ornstein-Uhlenbeck processes, in that states behave as random walks,
but drift towards a set of typically observed poses. This model is combined
with measurements of the human head and hand positions, using recursive
Bayesian estimation to incorporate temporal information. Measurements are
obtained using face detection and a simple skin colour hand detector, trained
using the detected face. The suggested model is designed with analytical
tractability in mind and we show that the pose tracking can be
Rao-Blackwellised using the mixture Kalman filter, allowing for computational
efficiency while still incorporating bio-mechanical properties of the upper
body. In addition, the use of the proposed upper body model allows reliable
three-dimensional pose estimates to be obtained indirectly for a number of
joints that are often difficult to detect using traditional object recognition
strategies. Comparisons with Kinect sensor results and the state of the art in
2D pose estimation highlight the efficacy of the proposed approach.Comment: 25 pages, Technical report, related to Burke and Lasenby, AMDO 2014
conference paper. Code sample: https://github.com/mgb45/SignerBodyPose Video:
https://www.youtube.com/watch?v=dJMTSo7-uF
Affine Correspondences between Multi-Camera Systems for Relative Pose Estimation
We present a novel method to compute the relative pose of multi-camera
systems using two affine correspondences (ACs). Existing solutions to the
multi-camera relative pose estimation are either restricted to special cases of
motion, have too high computational complexity, or require too many point
correspondences (PCs). Thus, these solvers impede an efficient or accurate
relative pose estimation when applying RANSAC as a robust estimator. This paper
shows that the 6DOF relative pose estimation problem using ACs permits a
feasible minimal solution, when exploiting the geometric constraints between
ACs and multi-camera systems using a special parameterization. We present a
problem formulation based on two ACs that encompass two common types of ACs
across two views, i.e., inter-camera and intra-camera. Moreover, the framework
for generating the minimal solvers can be extended to solve various relative
pose estimation problems, e.g., 5DOF relative pose estimation with known
rotation angle prior. Experiments on both virtual and real multi-camera systems
prove that the proposed solvers are more efficient than the state-of-the-art
algorithms, while resulting in a better relative pose accuracy. Source code is
available at https://github.com/jizhaox/relpose-mcs-depth
Geometry-Aware Learning of Maps for Camera Localization
Maps are a key component in image-based camera localization and visual SLAM
systems: they are used to establish geometric constraints between images,
correct drift in relative pose estimation, and relocalize cameras after lost
tracking. The exact definitions of maps, however, are often
application-specific and hand-crafted for different scenarios (e.g. 3D
landmarks, lines, planes, bags of visual words). We propose to represent maps
as a deep neural net called MapNet, which enables learning a data-driven map
representation. Unlike prior work on learning maps, MapNet exploits cheap and
ubiquitous sensory inputs like visual odometry and GPS in addition to images
and fuses them together for camera localization. Geometric constraints
expressed by these inputs, which have traditionally been used in bundle
adjustment or pose-graph optimization, are formulated as loss terms in MapNet
training and also used during inference. In addition to directly improving
localization accuracy, this allows us to update the MapNet (i.e., maps) in a
self-supervised manner using additional unlabeled video sequences from the
scene. We also propose a novel parameterization for camera rotation which is
better suited for deep-learning based camera pose regression. Experimental
results on both the indoor 7-Scenes dataset and the outdoor Oxford RobotCar
dataset show significant performance improvement over prior work. The MapNet
project webpage is https://goo.gl/mRB3Au.Comment: CVPR 2018 camera ready paper + supplementary materia
Gaze Estimation Based on Multi-view Geometric Neural Networks
Gaze and head pose estimation can play essential roles in various applications, such as human attention recognition and behavior analysis. Most of the deep neural network-based gaze estimation techniques use supervised regression techniques where features are extracted from eye images by neural networks and regress 3D gaze vectors. I plan to apply the geometric features of the eyes to determine the gaze vectors of observers relying on the concepts of 3D multiple view geometry. We develop an end to-end CNN framework for gaze estimation using 3D geometric constraints under semi-supervised and unsupervised settings and compare the results. We explore the mathematics behind the concepts of Homography and Structure-from- Motion and extend it to the gaze estimation problem using the eye region landmarks. We demonstrate the necessity of the application of 3D eye region landmarks for implementing the 3D geometry-based algorithms and address the problem when lacking the depth parameters in the gaze estimation datasets. We further explore the use of Convolutional Neural Networks (CNNs) to develop an end-to-end learning-based framework, which takes in sequential eye images to estimate the relative gaze changes of observers. We use a depth network for performing monocular image depth estimation of the eye region landmarks, which are further utilized by the pose network to estimate the relative gaze change using view synthesis constraints of the iris regions. We further explore CNN frameworks to estimate the relative changes in homography matrices between sequential eye images based on the eye region landmarks to estimate the pose of the iris and hence determine the relative change in the gaze of the observer. We compare and analyze the results obtained from mathematical calculations and deep neural network-based methods. We further compare the performance of the proposed CNN scheme with the state-of-the-art regression-based methods for gaze estimation. Future work involves extending the end-to-end pipeline as an unsupervised framework for gaze estimation in the wild
i3PosNet: Instrument Pose Estimation from X-Ray in temporal bone surgery
Purpose: Accurate estimation of the position and orientation (pose) of
surgical instruments is crucial for delicate minimally invasive temporal bone
surgery. Current techniques lack in accuracy and/or line-of-sight constraints
(conventional tracking systems) or expose the patient to prohibitive ionizing
radiation (intra-operative CT). A possible solution is to capture the
instrument with a c-arm at irregular intervals and recover the pose from the
image.
Methods: i3PosNet infers the position and orientation of instruments from
images using a pose estimation network. Said framework considers localized
patches and outputs pseudo-landmarks. The pose is reconstructed from
pseudo-landmarks by geometric considerations.
Results: We show i3PosNet reaches errors less than 0.05mm. It outperforms
conventional image registration-based approaches reducing average and maximum
errors by at least two thirds. i3PosNet trained on synthetic images generalizes
to real x-rays without any further adaptation.
Conclusion: The translation of Deep Learning based methods to surgical
applications is difficult, because large representative datasets for training
and testing are not available. This work empirically shows sub-millimeter pose
estimation trained solely based on synthetic training data.Comment: Accepted at International journal of computer assisted radiology and
surgery pending publicatio
- …