Search CORE

1,032 research outputs found

Matterport3D: Learning from RGB-D Data in Indoor Environments

Author: Chang Angel
Dai Angela
Funkhouser Thomas
Halber Maciej
Nießner Matthias
Savva Manolis
Song Shuran
Zeng Andy
Zhang Yinda
Publication venue
Publication date: 01/01/2017
Field of study

Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms. However, existing datasets still cover only a limited number of views or a restricted scale of spaces. In this paper, we introduce Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D and 3D semantic segmentations. The precise global alignment and comprehensive, diverse panoramic set of views over entire buildings enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

Author: Chang Angel X.
Funkhouser Thomas
Savarese Silvio
Savva Manolis
Song Shuran
Zeng Andy
Publication venue
Publication date: 12/12/2017
Field of study

We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image. To make this possible, Im2Pano3D leverages strong contextual priors learned from large-scale synthetic and real-world indoor scenes. To ease the prediction of 3D structure, we propose to parameterize 3D surfaces with their plane equations and train the model to predict these parameters directly. To provide meaningful training supervision, we use multiple loss functions that consider both pixel level accuracy and global context consistency. Experiments demon- strate that Im2Pano3D is able to predict the semantics and 3D structure of the unobserved scene with more than 56% pixel accuracy and less than 0.52m average distance error, which is significantly better than alternative approaches.Comment: Video summary: https://youtu.be/Au3GmktK-S

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation

Author: Fu Haodong
Ma Chaoxiang
Peng Kunyu
ReiB Simon
Shi Hao
Stiefelhagen Rainer
Torr Philip HS
Wang Kaiwei
Yang Kailun
Zhang Jiaming
Publication venue: IEEE
Publication date: 03/06/2024
Field of study

In this paper, we address panoramic semantic segmentation which is under-explored due to two critical challenges: (1) image distortions and object deformations on panoramas; (2) lack of semantic annotations in the 360∘ imagery. To tackle these problems, first, we propose the upgraded Transformer for Panoramic Semantic Segmentation, ie, Trans4PASS+, equipped with Deformable Patch Embedding (DPE) and Deformable MLP (DMLPv2) modules for handling object deformations and image distortions whenever (before or after adaptation) and wherever (shallow or deep levels). Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation. Third, aside from Pinhole-to-Panoramic ( Pin2Pan ) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images, facilitating Synthetic-to-Real ( Syn2Real ) adaptation scheme in 360∘ imagery. Extensive experiments are conducted, which cover indoor and outdoor scenarios, and each of them is investigated with Pin2Pan and Syn2Real regimens. Trans4PASS+ achieves state-of-the-art performances on four domain adaptive panoramic semantic segmentation benchmarks. Code is available at https://github.com/jamycheung/Trans4PASS

Oxford University Research Archive

Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation

Author: Fu H
Ma C
Peng Kun
Reiß S
Shi H
Stiefelhagen R
Torr Philip
Wang K
Yang K
Zhang Jiaming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/06/2024
Field of study

In this paper, we address panoramic semantic segmentation which is under-explored due to two critical challenges: (1) image distortions and object deformations on panoramas; (2) lack of semantic annotations in the 360◦ imagery. To tackle these problems, first, we propose the upgraded Transformer for Panoramic Semantic Segmentation, i.e., Trans4PASS+, equipped with Deformable Patch Embedding (DPE) and Deformable MLP (DMLPv2) modules for handling object deformations and image distortions whenever (before or after adaptation) and wherever (shallow or deep levels). Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation. Third, aside from Pinhole-to-Panoramic (PIN2PAN) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images, facilitating Synthetic-to-Real (SYN2REAL) adaptation scheme in 360◦ imagery. Extensive experiments are conducted, which cover indoor and outdoor scenarios, and each of them is investigated with PIN2PAN and SYN2REAL regimens. Trans4PASS+ achieves state-of-the-art performances on four domain adaptive panoramic semantic segmentation benchmarks. Code is available at https://github.com/jamycheung/Trans4PASS

Oxford University Research Archive

Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation

Author: Fu Haodong
Ma Chaoxiang
Peng Kunyu
Reiß Simon
Shi Hao
Stiefelhagen Rainer
Torr Philip H. S.
Wang Kaiwei
Yang Kailun
Zhang Jiaming
Publication venue
Publication date: 29/07/2023
Field of study

In this paper, we address panoramic semantic segmentation which is under-explored due to two critical challenges: (1) image distortions and object deformations on panoramas; (2) lack of semantic annotations in the 360-degree imagery. To tackle these problems, first, we propose the upgraded Transformer for Panoramic Semantic Segmentation, i.e., Trans4PASS+, equipped with Deformable Patch Embedding (DPE) and Deformable MLP (DMLPv2) modules for handling object deformations and image distortions whenever (before or after adaptation) and wherever (shallow or deep levels). Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation. Third, aside from Pinhole-to-Panoramic (Pin2Pan) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images, facilitating Synthetic-to-Real (Syn2Real) adaptation scheme in 360-degree imagery. Extensive experiments are conducted, which cover indoor and outdoor scenarios, and each of them is investigated with Pin2Pan and Syn2Real regimens. Trans4PASS+ achieves state-of-the-art performances on four domain adaptive panoramic semantic segmentation benchmarks. Code is available at https://github.com/jamycheung/Trans4PASS.Comment: Extended version of CVPR 2022 paper arXiv:2203.01452. Code is available at https://github.com/jamycheung/Trans4PAS

arXiv.org e-Print Archive