Search CORE

1,214 research outputs found

Frustum PointNets for 3D Object Detection from RGB-D Data

Author: Guibas Leonidas J.
Liu Wei
Qi Charles R.
Su Hao
Wu Chenxia
Publication venue
Publication date: 12/04/2018
Field of study

In this work, we study 3D object detection from RGB-D data in both indoor and outdoor scenes. While previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, we directly operate on raw point clouds by popping up RGB-D scans. However, a key challenge of this approach is how to efficiently localize objects in point clouds of large-scale scenes (region proposal). Instead of solely relying on 3D proposals, our method leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects. Benefited from learning directly in raw point clouds, our method is also able to precisely estimate 3D bounding boxes even under strong occlusion or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection benchmarks, our method outperforms the state of the art by remarkable margins while having real-time capability.Comment: 15 pages, 12 figures, 14 table

arXiv.org e-Print Archive

Crossref

SeGAN: Segmenting and Generating the Invisible

Author: Ehsani Kiana
Farhadi Ali
Mottaghi Roozbeh
Publication venue
Publication date: 07/05/2018
Field of study

Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation. In this paper, we study the challenging problem of completing the appearance of occluded objects. Doing so requires knowing which pixels to paint (segmenting the invisible parts of objects) and what color to paint them (generating the invisible parts). Our proposed novel solution, SeGAN, jointly optimizes for both segmentation and generation of the invisible parts of objects. Our experimental results show that: (a) SeGAN can learn to generate the appearance of the occluded parts of objects; (b) SeGAN outperforms state-of-the-art segmentation baselines for the invisible parts of objects; (c) trained on synthetic photo realistic images, SeGAN can reliably segment natural images; (d) by reasoning about occluder occludee relations, our method can infer depth layering.Comment: Accepted to CVPR18 as spotligh

arXiv.org e-Print Archive

Crossref

Semantic amodal video segmentation using a synthetic dataset

Author: Hui Kexin
Publication venue
Publication date: 01/12/2018
Field of study

In this work, we provide tools for annotating both object category and shot transitions for a new semantic modal instance-level object segmentation dataset. This new dataset provides ample opportunities to train models for instance-level segmentation, both modal and amodal. Moreover, in this work, we also present results for instance-level segmentation using ResNet-based DeepLab, a state-of-the-art semantic image segmentation model. We also develop a new semantic amodal instance-level video segmentation model based on DeepLab for the aforementioned dataset. Our model for amodal segmentation operates on a per-frame basis, and the model is guided by the modal mask estimated from the current frame and from previous frames delineating the object of interest. We demonstrate the efficacy of the proposed model on the new dataset

Illinois Digital Environment for Access to Learning and Scholarship Repository

Panoptic Segmentation

Author: Dollár Piotr
Girshick Ross
He Kaiming
Kirillov Alexander
Rother Carsten
Publication venue
Publication date: 10/04/2019
Field of study

We propose and study a task we name panoptic segmentation (PS). Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step toward real-world vision systems. While early work in computer vision addressed related image/scene parsing tasks, these are not currently popular, possibly due to lack of appropriate metrics or associated recognition challenges. To address this, we propose a novel panoptic quality (PQ) metric that captures performance for all classes (stuff and things) in an interpretable and unified manner. Using the proposed metric, we perform a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task. The aim of our work is to revive the interest of the community in a more unified view of image segmentation.Comment: accepted to CVPR 201

arXiv.org e-Print Archive

Crossref