556 research outputs found

    Frustum PointNets for 3D Object Detection from RGB-D Data

    Full text link
    In this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability.Comment: 15 pages, 12 figures, 14 table

    Improving 3d pedestrian detection for wearable sensor data with 2d human pose

    Get PDF
    Collisions and safety are important concepts when dealing with urban designs like shared spaces. As pedestrians (especially the elderly and disabled people) are more vulnerable to accidents, realising an intelligent mobility aid to avoid collisions is a direction of research that could improve safety using a wearable device. Also, with the improvements in technologies for visualisation and their capabilities to render 3D virtual content, AR devices could be used to realise virtual infrastructure and virtual traffic systems. Such devices (e.g., Hololens) scan the environment using stereo and ToF (Time-of-Flight) sensors, which in principle can be used to detect surrounding objects, including dynamic agents such as pedestrians. This can be used as basis to predict collisions. To envision an AR device as a safety aid and demonstrate its 3D object detection capability (in particular: pedestrian detection), we propose an improvement to the 3D object detection framework Frustum Pointnet with human pose and apply it on the data from an AR device. Using the data from such a device in an indoor setting, we conducted a comparative study to investigate how high level 2D human pose features in our approach could help to improve the detection performance of orientated 3D pedestrian instances over Frustum Pointnet

    Frustum VoxNet for 3D object detection from RGB-D or Depth images

    Full text link
    Recently, there have been a plethora of classification and detection systems from RGB as well as 3D images. In this work, we describe a new 3D object detection system from an RGB-D or depth-only point cloud. Our system first detects objects in 2D (either RGB or pseudo-RGB constructed from depth). The next step is to detect 3D objects within the 3D frustums these 2D detections define. This is achieved by voxelizing parts of the frustums (since frustums can be really large), instead of using the whole frustums as done in earlier work. The main novelty of our system has to do with determining which parts (3D proposals) of the frustums to voxelize, thus allowing us to provide high resolution representations around the objects of interest. It also allows our system to have reduced memory requirements. These 3D proposals are fed to an efficient ResNet-based 3D Fully Convolutional Network (FCN). Our 3D detection system is fast and can be integrated into a robotics platform. With respect to systems that do not perform voxelization (such as PointNet), our methods can operate without the requirement of subsampling of the datasets. We have also introduced a pipelining approach that further improves the efficiency of our system. Results on SUN RGB-D dataset show that our system, which is based on a small network, can process 20 frames per second with comparable detection results to the state-of-the-art, achieving a 2 times speedup.Comment: page 8, add Acknowledgement. page 10, add Supplementary Material. The paper got accepted by 2020 Winter Conference on Applications of Computer Vision (WACV '20). The first arxiv version can be found here: arXiv:1910.0548

    Object Detection Using LiDAR and Camera Fusion in Off-road Conditions

    Get PDF
    Seoses hüppelise huvi kasvuga autonoomsete sõidukite vastu viimastel aastatel on suurenenud ka vajadus täpsemate ja töökindlamate objektituvastuse meetodite järele. Kuigi tänu konvolutsioonilistele närvivõrkudele on palju edu saavutatud 2D objektituvastuses, siis võrreldavate tulemuste saavutamine 3D maailmas on seni jäänud unistuseks. Põhjuseks on mitmesugused probleemid eri modaalsusega sensorite andmevoogude ühitamisel, samuti on 3D maailmas märgendatud andmestike loomine aeganõudvam ja kallim. Sõltumata sellest, kas kasutame objektide kauguse hindamiseks stereo kaamerat või lidarit, kaasnevad andmevoogude ühitamisega ajastusprobleemid, mis raskendavad selliste lahenduste kasutamist reaalajas. Lisaks on enamus olemasolevaid lahendusi eelkõige välja töötatud ja testitud linnakeskkonnas liikumiseks.Töös pakutakse välja meetod 3D objektituvastuseks, mis põhineb 2D objektituvastuse tulemuste (objekte ümbritsevad kastid või segmenteerimise maskid) projitseerimisel 3D punktipilve ning saadud punktipilve filtreerimisel klasterdamismeetoditega. Tulemusi võrreldakse lihtsa termokaamera piltide filtreerimisel põhineva lahendusega. Täiendavalt viiakse läbi põhjalikud eksperimendid parimate algoritmi parameetrite leidmiseks objektituvastuseks maastikul, saavutamaks suurimat võimalikku täpsust reaalajas.Since the boom in the industry of autonomous vehicles, the need for preciseenvironment perception and robust object detection methods has grown. While we are making progress with state-of-the-art in 2D object detection with approaches such as convolutional neural networks, the challenge remains in efficiently achieving the same level of performance in 3D. The reasons for this include limitations of fusing multi-modal data and the cost of labelling different modalities for training such networks. Whether we use a stereo camera to perceive scene’s ranging information or use time of flight ranging sensors such as LiDAR, ​ the existing pipelines for object detection in point clouds have certain bottlenecks and latency issues which tend to affect the accuracy of detection in real time speed. Moreover, ​ these existing methods are primarily implemented and tested over urban cityscapes.This thesis presents a fusion based approach for detecting objects in 3D by projecting the proposed 2D regions of interest (object’s bounding boxes) or masks (semantically segmented images) to point clouds and applies outlier filtering techniques to filter out target object points in projected regions of interest. Additionally, we compare it with human detection using thermal image thresholding and filtering. Lastly, we performed rigorous benchmarks over the off-road environments to identify potential bottlenecks and to find a combination of pipeline parameters that can maximize the accuracy and performance of real-time object detection in 3D point clouds

    3D Object Detection Using Scale Invariant and Feature Reweighting Networks

    Full text link
    3D object detection plays an important role in a large number of real-world applications. It requires us to estimate the localizations and the orientations of 3D objects in real scenes. In this paper, we present a new network architecture which focuses on utilizing the front view images and frustum point clouds to generate 3D detection results. On the one hand, a PointSIFT module is utilized to improve the performance of 3D segmentation. It can capture the information from different orientations in space and the robustness to different scale shapes. On the other hand, our network obtains the useful features and suppresses the features with less information by a SENet module. This module reweights channel features and estimates the 3D bounding boxes more effectively. Our method is evaluated on both KITTI dataset for outdoor scenes and SUN-RGBD dataset for indoor scenes. The experimental results illustrate that our method achieves better performance than the state-of-the-art methods especially when point clouds are highly sparse.Comment: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19
    corecore