22,754 research outputs found
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Articulated Clinician Detection Using 3D Pictorial Structures on RGB-D Data
Reliable human pose estimation (HPE) is essential to many clinical
applications, such as surgical workflow analysis, radiation safety monitoring
and human-robot cooperation. Proposed methods for the operating room (OR) rely
either on foreground estimation using a multi-camera system, which is a
challenge in real ORs due to color similarities and frequent illumination
changes, or on wearable sensors or markers, which are invasive and therefore
difficult to introduce in the room. Instead, we propose a novel approach based
on Pictorial Structures (PS) and on RGB-D data, which can be easily deployed in
real ORs. We extend the PS framework in two ways. First, we build robust and
discriminative part detectors using both color and depth images. We also
present a novel descriptor for depth images, called histogram of depth
differences (HDD). Second, we extend PS to 3D by proposing 3D pairwise
constraints and a new method that makes exact inference tractable. Our approach
is evaluated for pose estimation and clinician detection on a challenging RGB-D
dataset recorded in a busy operating room during live surgeries. We conduct
series of experiments to study the different part detectors in conjunction with
the various 2D or 3D pairwise constraints. Our comparisons demonstrate that 3D
PS with RGB-D part detectors significantly improves the results in a visually
challenging operating environment.Comment: The supplementary video is available at https://youtu.be/iabbGSqRSg
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
Driven to Distraction: Self-Supervised Distractor Learning for Robust Monocular Visual Odometry in Urban Environments
We present a self-supervised approach to ignoring "distractors" in camera
images for the purposes of robustly estimating vehicle motion in cluttered
urban environments. We leverage offline multi-session mapping approaches to
automatically generate a per-pixel ephemerality mask and depth map for each
input image, which we use to train a deep convolutional network. At run-time we
use the predicted ephemerality and depth as an input to a monocular visual
odometry (VO) pipeline, using either sparse features or dense photometric
matching. Our approach yields metric-scale VO using only a single camera and
can recover the correct egomotion even when 90% of the image is obscured by
dynamic, independently moving objects. We evaluate our robust VO methods on
more than 400km of driving from the Oxford RobotCar Dataset and demonstrate
reduced odometry drift and significantly improved egomotion estimation in the
presence of large moving vehicles in urban traffic.Comment: International Conference on Robotics and Automation (ICRA), 2018.
Video summary: http://youtu.be/ebIrBn_nc-
Perception-aware Path Planning
In this paper, we give a double twist to the problem of planning under
uncertainty. State-of-the-art planners seek to minimize the localization
uncertainty by only considering the geometric structure of the scene. In this
paper, we argue that motion planning for vision-controlled robots should be
perception aware in that the robot should also favor texture-rich areas to
minimize the localization uncertainty during a goal-reaching task. Thus, we
describe how to optimally incorporate the photometric information (i.e.,
texture) of the scene, in addition to the the geometric one, to compute the
uncertainty of vision-based localization during path planning. To avoid the
caveats of feature-based localization systems (i.e., dependence on feature type
and user-defined thresholds), we use dense, direct methods. This allows us to
compute the localization uncertainty directly from the intensity values of
every pixel in the image. We also describe how to compute trajectories online,
considering also scenarios with no prior knowledge about the map. The proposed
framework is general and can easily be adapted to different robotic platforms
and scenarios. The effectiveness of our approach is demonstrated with extensive
experiments in both simulated and real-world environments using a
vision-controlled micro aerial vehicle.Comment: 16 pages, 20 figures, revised version. Conditionally accepted for
IEEE Transactions on Robotic
Human-centric light sensing and estimation from RGBD images: The invisible light switch
Lighting design in indoor environments is of primary importance for at least
two reasons: 1) people should perceive an adequate light; 2) an effective
lighting design means consistent energy saving. We present the Invisible Light
Switch (ILS) to address both aspects. ILS dynamically adjusts the room
illumination level to save energy while maintaining constant the light level
perception of the users. So the energy saving is invisible to them. Our
proposed ILS leverages a radiosity model to estimate the light level which is
perceived by a person within an indoor environment, taking into account the
person position and her/his viewing frustum (head pose). ILS may therefore dim
those luminaires, which are not seen by the user, resulting in an effective
energy saving, especially in large open offices (where light may otherwise be
ON everywhere for a single person). To quantify the system performance, we have
collected a new dataset where people wear luxmeter devices while working in
office rooms. The luxmeters measure the amount of light (in Lux) reaching the
people gaze, which we consider a proxy to their illumination level perception.
Our initial results are promising: in a room with 8 LED luminaires, the energy
consumption in a day may be reduced from 18585 to 6206 watts with ILS
(currently needing 1560 watts for operations). While doing so, the drop in
perceived lighting decreases by just 200 lux, a value considered negligible
when the original illumination level is above 1200 lux, as is normally the case
in offices
Human-centric light sensing and estimation from RGBD images: the invisible light switch
Lighting design in indoor environments is of primary importance for at least two reasons: 1) people should perceive an adequate light; 2) an effective lighting design means consistent energy saving. We present the Invisible Light Switch (ILS) to address both aspects. ILS dynamically adjusts the room illumination level to save energy while maintaining constant the light level perception of the users. So the energy saving is invisible to them. Our proposed ILS leverages a radiosity model to estimate the light level which is perceived by a person within an indoor environment, taking into account the person position and her/his viewing frustum (head pose). ILS may therefore dim those luminaires, which are not seen by the user, resulting in an effective energy saving, especially in large open offices (where light may otherwise be ON everywhere for a single person). To quantify the system performance, we have collected a new dataset where people wear luxmeter devices while working in office rooms. The luxmeters measure the amount of light (in Lux) reaching the people gaze, which we consider a proxy to their illumination level perception. Our initial results are promising: in a room with 8 LED luminaires, the energy consumption in a day may be reduced from 18585 to 6206 watts with ILS (currently needing 1560 watts for operations). While doing so, the drop in perceived lighting decreases by just 200 lux, a value considered negligible when the original illumination level is above 1200 lux, as is normally the case in offices
- …