12,512 research outputs found
Defining the Pose of any 3D Rigid Object and an Associated Distance
The pose of a rigid object is usually regarded as a rigid transformation,
described by a translation and a rotation. However, equating the pose space
with the space of rigid transformations is in general abusive, as it does not
account for objects with proper symmetries -- which are common among man-made
objects.In this article, we define pose as a distinguishable static state of an
object, and equate a pose with a set of rigid transformations. Based solely on
geometric considerations, we propose a frame-invariant metric on the space of
possible poses, valid for any physical rigid object, and requiring no arbitrary
tuning. This distance can be evaluated efficiently using a representation of
poses within an Euclidean space of at most 12 dimensions depending on the
object's symmetries. This makes it possible to efficiently perform neighborhood
queries such as radius searches or k-nearest neighbor searches within a large
set of poses using off-the-shelf methods. Pose averaging considering this
metric can similarly be performed easily, using a projection function from the
Euclidean space onto the pose space. The practical value of those theoretical
developments is illustrated with an application of pose estimation of instances
of a 3D rigid object given an input depth map, via a Mean Shift procedure
Do-It-Yourself Single Camera 3D Pointer Input Device
We present a new algorithm for single camera 3D reconstruction, or 3D input
for human-computer interfaces, based on precise tracking of an elongated
object, such as a pen, having a pattern of colored bands. To configure the
system, the user provides no more than one labelled image of a handmade
pointer, measurements of its colored bands, and the camera's pinhole projection
matrix. Other systems are of much higher cost and complexity, requiring
combinations of multiple cameras, stereocameras, and pointers with sensors and
lights. Instead of relying on information from multiple devices, we examine our
single view more closely, integrating geometric and appearance constraints to
robustly track the pointer in the presence of occlusion and distractor objects.
By probing objects of known geometry with the pointer, we demonstrate
acceptable accuracy of 3D localization.Comment: 8 pages, 6 figures, 2018 15th Conference on Computer and Robot Visio
Boosted Random ferns for object detection
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation.Peer ReviewedPostprint (author's final draft
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
In this paper, we introduce a novel unsupervised domain adaptation technique
for the task of 3D keypoint prediction from a single depth scan or image. Our
key idea is to utilize the fact that predictions from different views of the
same or similar objects should be consistent with each other. Such view
consistency can provide effective regularization for keypoint prediction on
unlabeled instances. In addition, we introduce a geometric alignment term to
regularize predictions in the target domain. The resulting loss function can be
effectively optimized via alternating minimization. We demonstrate the
effectiveness of our approach on real datasets and present experimental results
showing that our approach is superior to state-of-the-art general-purpose
domain adaptation techniques.Comment: ECCV 201
Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
Object detection and 6D pose estimation in the crowd (scenes with multiple
object instances, severe foreground occlusions and background distractors), has
become an important problem in many rapidly evolving technological areas such
as robotics and augmented reality. Single shot-based 6D pose estimators with
manually designed features are still unable to tackle the above challenges,
motivating the research towards unsupervised feature learning and
next-best-view estimation. In this work, we present a complete framework for
both single shot-based 6D object pose estimation and next-best-view prediction
based on Hough Forests, the state of the art object pose estimator that
performs classification and regression jointly. Rather than using manually
designed features we a) propose an unsupervised feature learnt from
depth-invariant patches using a Sparse Autoencoder and b) offer an extensive
evaluation of various state of the art features. Furthermore, taking advantage
of the clustering performed in the leaf nodes of Hough Forests, we learn to
estimate the reduction of uncertainty in other views, formulating the problem
of selecting the next-best-view. To further improve pose estimation, we propose
an improved joint registration and hypotheses verification module as a final
refinement step to reject false detections. We provide two additional
challenging datasets inspired from realistic scenarios to extensively evaluate
the state of the art and our framework. One is related to domestic environments
and the other depicts a bin-picking scenario mostly found in industrial
settings. We show that our framework significantly outperforms state of the art
both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
- …