49,754 research outputs found
Appearance-Based Gaze Estimation in the Wild
Appearance-based gaze estimation is believed to work well in real-world
settings, but existing datasets have been collected under controlled laboratory
conditions and methods have been not evaluated across multiple datasets. In
this work we study appearance-based gaze estimation in the wild. We present the
MPIIGaze dataset that contains 213,659 images we collected from 15 participants
during natural everyday laptop use over more than three months. Our dataset is
significantly more variable than existing ones with respect to appearance and
illumination. We also present a method for in-the-wild appearance-based gaze
estimation using multimodal convolutional neural networks that significantly
outperforms state-of-the art methods in the most challenging cross-dataset
evaluation. We present an extensive evaluation of several state-of-the-art
image-based gaze estimation algorithms on three current datasets, including our
own. This evaluation provides clear insights and allows us to identify key
research challenges of gaze estimation in the wild
Hierarchical Object Parsing from Structured Noisy Point Clouds
Object parsing and segmentation from point clouds are challenging tasks
because the relevant data is available only as thin structures along object
boundaries or other features, and is corrupted by large amounts of noise. To
handle this kind of data, flexible shape models are desired that can accurately
follow the object boundaries. Popular models such as Active Shape and Active
Appearance models lack the necessary flexibility for this task, while recent
approaches such as the Recursive Compositional Models make model
simplifications in order to obtain computational guarantees. This paper
investigates a hierarchical Bayesian model of shape and appearance in a
generative setting. The input data is explained by an object parsing layer,
which is a deformation of a hidden PCA shape model with Gaussian prior. The
paper also introduces a novel efficient inference algorithm that uses informed
data-driven proposals to initialize local searches for the hidden variables.
Applied to the problem of object parsing from structured point clouds such as
edge detection images, the proposed approach obtains state of the art parsing
errors on two standard datasets without using any intensity information.Comment: 13 pages, 16 figure
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
Perception-aware Path Planning
In this paper, we give a double twist to the problem of planning under
uncertainty. State-of-the-art planners seek to minimize the localization
uncertainty by only considering the geometric structure of the scene. In this
paper, we argue that motion planning for vision-controlled robots should be
perception aware in that the robot should also favor texture-rich areas to
minimize the localization uncertainty during a goal-reaching task. Thus, we
describe how to optimally incorporate the photometric information (i.e.,
texture) of the scene, in addition to the the geometric one, to compute the
uncertainty of vision-based localization during path planning. To avoid the
caveats of feature-based localization systems (i.e., dependence on feature type
and user-defined thresholds), we use dense, direct methods. This allows us to
compute the localization uncertainty directly from the intensity values of
every pixel in the image. We also describe how to compute trajectories online,
considering also scenarios with no prior knowledge about the map. The proposed
framework is general and can easily be adapted to different robotic platforms
and scenarios. The effectiveness of our approach is demonstrated with extensive
experiments in both simulated and real-world environments using a
vision-controlled micro aerial vehicle.Comment: 16 pages, 20 figures, revised version. Conditionally accepted for
IEEE Transactions on Robotic
- …