7,216 research outputs found
Joint 3D Proposal Generation and Object Detection from View Aggregation
We present AVOD, an Aggregate View Object Detection network for autonomous
driving scenarios. The proposed neural network architecture uses LIDAR point
clouds and RGB images to generate features that are shared by two subnetworks:
a region proposal network (RPN) and a second stage detector network. The
proposed RPN uses a novel architecture capable of performing multimodal feature
fusion on high resolution feature maps to generate reliable 3D object proposals
for multiple object classes in road scenes. Using these proposals, the second
stage detection network performs accurate oriented 3D bounding box regression
and category classification to predict the extents, orientation, and
classification of objects in 3D space. Our proposed architecture is shown to
produce state of the art results on the KITTI 3D object detection benchmark
while running in real time with a low memory footprint, making it a suitable
candidate for deployment on autonomous vehicles. Code is at:
https://github.com/kujason/avodComment: For any inquiries contact aharakeh(at)uwaterloo(dot)c
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
Aperture Supervision for Monocular Depth Estimation
We present a novel method to train machine learning algorithms to estimate
scene depths from a single image, by using the information provided by a
camera's aperture as supervision. Prior works use a depth sensor's outputs or
images of the same scene from alternate viewpoints as supervision, while our
method instead uses images from the same viewpoint taken with a varying camera
aperture. To enable learning algorithms to use aperture effects as supervision,
we introduce two differentiable aperture rendering functions that use the input
image and predicted depths to simulate the depth-of-field effects caused by
real camera apertures. We train a monocular depth estimation network end-to-end
to predict the scene depths that best explain these finite aperture images as
defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version
- …