556 research outputs found
Frustum PointNets for 3D Object Detection from RGB-D Data
In this work, we study 3D object detection from RGB-D data in both indoor and
outdoor scenes. While previous methods focus on images or 3D voxels, often
obscuring natural 3D patterns and invariances of 3D data, we directly operate
on raw point clouds by popping up RGB-D scans. However, a key challenge of this
approach is how to efficiently localize objects in point clouds of large-scale
scenes (region proposal). Instead of solely relying on 3D proposals, our method
leverages both mature 2D object detectors and advanced 3D deep learning for
object localization, achieving efficiency as well as high recall for even small
objects. Benefited from learning directly in raw point clouds, our method is
also able to precisely estimate 3D bounding boxes even under strong occlusion
or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection
benchmarks, our method outperforms the state of the art by remarkable margins
while having real-time capability.Comment: 15 pages, 12 figures, 14 table
Improving 3d pedestrian detection for wearable sensor data with 2d human pose
Collisions and safety are important concepts when dealing with urban designs like shared spaces. As pedestrians (especially the elderly and disabled people) are more vulnerable to accidents, realising an intelligent mobility aid to avoid collisions is a direction of research that could improve safety using a wearable device. Also, with the improvements in technologies for visualisation and their capabilities to render 3D virtual content, AR devices could be used to realise virtual infrastructure and virtual traffic systems. Such devices (e.g., Hololens) scan the environment using stereo and ToF (Time-of-Flight) sensors, which in principle can be used to detect surrounding objects, including dynamic agents such as pedestrians. This can be used as basis to predict collisions. To envision an AR device as a safety aid and demonstrate its 3D object detection capability (in particular: pedestrian detection), we propose an improvement to the 3D object detection framework Frustum Pointnet with human pose and apply it on the data from an AR device. Using the data from such a device in an indoor setting, we conducted a comparative study to investigate how high level 2D human pose features in our approach could help to improve the detection performance of orientated 3D pedestrian instances over Frustum Pointnet
Frustum VoxNet for 3D object detection from RGB-D or Depth images
Recently, there have been a plethora of classification and detection systems
from RGB as well as 3D images. In this work, we describe a new 3D object
detection system from an RGB-D or depth-only point cloud. Our system first
detects objects in 2D (either RGB or pseudo-RGB constructed from depth). The
next step is to detect 3D objects within the 3D frustums these 2D detections
define. This is achieved by voxelizing parts of the frustums (since frustums
can be really large), instead of using the whole frustums as done in earlier
work. The main novelty of our system has to do with determining which parts (3D
proposals) of the frustums to voxelize, thus allowing us to provide high
resolution representations around the objects of interest. It also allows our
system to have reduced memory requirements. These 3D proposals are fed to an
efficient ResNet-based 3D Fully Convolutional Network (FCN). Our 3D detection
system is fast and can be integrated into a robotics platform. With respect to
systems that do not perform voxelization (such as PointNet), our methods can
operate without the requirement of subsampling of the datasets. We have also
introduced a pipelining approach that further improves the efficiency of our
system. Results on SUN RGB-D dataset show that our system, which is based on a
small network, can process 20 frames per second with comparable detection
results to the state-of-the-art, achieving a 2 times speedup.Comment: page 8, add Acknowledgement. page 10, add Supplementary Material. The
paper got accepted by 2020 Winter Conference on Applications of Computer
Vision (WACV '20). The first arxiv version can be found here:
arXiv:1910.0548
Object Detection Using LiDAR and Camera Fusion in Off-road Conditions
Seoses hüppelise huvi kasvuga autonoomsete sõidukite vastu viimastel aastatel on suurenenud ka vajadus täpsemate ja töökindlamate objektituvastuse meetodite järele. Kuigi tänu konvolutsioonilistele närvivõrkudele on palju edu saavutatud 2D objektituvastuses, siis võrreldavate tulemuste saavutamine 3D maailmas on seni jäänud unistuseks. Põhjuseks on mitmesugused probleemid eri modaalsusega sensorite andmevoogude ühitamisel, samuti on 3D maailmas märgendatud andmestike loomine aeganõudvam ja kallim. Sõltumata sellest, kas kasutame objektide kauguse hindamiseks stereo kaamerat või lidarit, kaasnevad andmevoogude ühitamisega ajastusprobleemid, mis raskendavad selliste lahenduste kasutamist reaalajas. Lisaks on enamus olemasolevaid lahendusi eelkõige välja töötatud ja testitud linnakeskkonnas liikumiseks.Töös pakutakse välja meetod 3D objektituvastuseks, mis põhineb 2D objektituvastuse tulemuste (objekte ümbritsevad kastid või segmenteerimise maskid) projitseerimisel 3D punktipilve ning saadud punktipilve filtreerimisel klasterdamismeetoditega. Tulemusi võrreldakse lihtsa termokaamera piltide filtreerimisel põhineva lahendusega. Täiendavalt viiakse läbi põhjalikud eksperimendid parimate algoritmi parameetrite leidmiseks objektituvastuseks maastikul, saavutamaks suurimat võimalikku täpsust reaalajas.Since the boom in the industry of autonomous vehicles, the need for preciseenvironment perception and robust object detection methods has grown. While we are making progress with state-of-the-art in 2D object detection with approaches such as convolutional neural networks, the challenge remains in efficiently achieving the same level of performance in 3D. The reasons for this include limitations of fusing multi-modal data and the cost of labelling different modalities for training such networks. Whether we use a stereo camera to perceive scene’s ranging information or use time of flight ranging sensors such as LiDAR, the existing pipelines for object detection in point clouds have certain bottlenecks and latency issues which tend to affect the accuracy of detection in real time speed. Moreover, these existing methods are primarily implemented and tested over urban cityscapes.This thesis presents a fusion based approach for detecting objects in 3D by projecting the proposed 2D regions of interest (object’s bounding boxes) or masks (semantically segmented images) to point clouds and applies outlier filtering techniques to filter out target object points in projected regions of interest. Additionally, we compare it with human detection using thermal image thresholding and filtering. Lastly, we performed rigorous benchmarks over the off-road environments to identify potential bottlenecks and to find a combination of pipeline parameters that can maximize the accuracy and performance of real-time object detection in 3D point clouds
3D Object Detection Using Scale Invariant and Feature Reweighting Networks
3D object detection plays an important role in a large number of real-world
applications. It requires us to estimate the localizations and the orientations
of 3D objects in real scenes. In this paper, we present a new network
architecture which focuses on utilizing the front view images and frustum point
clouds to generate 3D detection results. On the one hand, a PointSIFT module is
utilized to improve the performance of 3D segmentation. It can capture the
information from different orientations in space and the robustness to
different scale shapes. On the other hand, our network obtains the useful
features and suppresses the features with less information by a SENet module.
This module reweights channel features and estimates the 3D bounding boxes more
effectively. Our method is evaluated on both KITTI dataset for outdoor scenes
and SUN-RGBD dataset for indoor scenes. The experimental results illustrate
that our method achieves better performance than the state-of-the-art methods
especially when point clouds are highly sparse.Comment: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19
- …