Search CORE

51,691 research outputs found

Depth map compression via 3D region-based representation

Author: Maceira Marc
Morros Rubió Josep Ramon
Ruiz Hidalgo Javier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In 3D video, view synthesis is used to create new virtual views between encoded camera views. Errors in the coding of the depth maps introduce geometry inconsistencies in synthesized views. In this paper, a new 3D plane representation of the scene is presented which improves the performance of current standard video codecs in the view synthesis domain. Two image segmentation algorithms are proposed for generating a color and depth segmentation. Using both partitions, depth maps are segmented into regions without sharp discontinuities without having to explicitly signal all depth edges. The resulting regions are represented using a planar model in the 3D world scene. This 3D representation allows an efficient encoding while preserving the 3D characteristics of the scene. The 3D planes open up the possibility to code multiview images with a unique representation.Postprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Geodesic Distance Histogram Feature for Video Segmentation

Author: A Kundu
EH Taralova
F Galasso
P Krähenbühl
T Brox
T Brox
T Leung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/03/2017
Field of study

This paper proposes a geodesic-distance-based feature that encodes global information for improved video segmentation algorithms. The feature is a joint histogram of intensity and geodesic distances, where the geodesic distances are computed as the shortest paths between superpixels via their boundaries. We also incorporate adaptive voting weights and spatial pyramid configurations to include spatial information into the geodesic histogram feature and show that this further improves results. The feature is generic and can be used as part of various algorithms. In experiments, we test the geodesic histogram feature by incorporating it into two existing video segmentation frameworks. This leads to significantly better performance in 3D video segmentation benchmarks on two datasets

arXiv.org e-Print Archive

Crossref

3D video performance segmentation

Author: Takashi Matsuyama
Tony Tung
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

We present a novel approach that achieves segmentation of subject body parts in 3D videos. 3D video consists in a free-viewpoint video of real-world subjects in motion immersed in a virtual world. Each 3D video frame is composed of one or several 3D models. A topology dictionary is used to cluster 3D video sequences with respect to the model topology and shape. The topology is characterized using Reeb graph-based descriptors and no prior explicit model on the subject shape is necessary to perform the clustering process. In this frame-work, the dictionary consists in a set of training input poses with a priori segmentation and labels. As a consequence, all identified frames of 3D video sequences can be automatically segmented. Finally, motion flows computed between consec-utive frames are used to transfer segmented region labels to unidentified frames. Our method allows us to perform robust body part segmentation and tracking in 3D cinema sequences. Index Terms — 3D video, topology dictionary, shape matching, body segmentation 1

CiteSeerX

Crossref

Making a Case for 3D Convolutions for Object Segmentation in Videos

Author: Athar Ali
Hennen Sebastian
Leal-Taixé Laura
Leibe Bastian
Mahadevan Sabarinath
Ošep Aljoša
Publication venue
Publication date: 26/08/2020
Field of study

The task of object segmentation in videos is usually accomplished by processing appearance and motion information separately using standard 2D convolutional networks, followed by a learned fusion of the two sources of information. On the other hand, 3D convolutional networks have been successfully applied for video classification tasks, but have not been leveraged as effectively to problems involving dense per-pixel interpretation of videos compared to their 2D convolutional counterparts and lag behind the aforementioned networks in terms of performance. In this work, we show that 3D CNNs can be effectively applied to dense video prediction tasks such as salient object segmentation. We propose a simple yet effective encoder-decoder network architecture consisting entirely of 3D convolutions that can be trained end-to-end using a standard cross-entropy loss. To this end, we leverage an efficient 3D encoder, and propose a 3D decoder architecture, that comprises novel 3D Global Convolution layers and 3D Refinement modules. Our approach outperforms existing state-of-the-arts by a large margin on the DAVIS'16 Unsupervised, FBMS and ViSal dataset benchmarks in addition to being faster, thus showing that our architecture can efficiently learn expressive spatio-temporal features and produce high quality video segmentation masks. Our code and models will be made publicly available.Comment: BMVC '2

arXiv.org e-Print Archive

4DContrast: Contrastive Learning with Dynamic Correspondences for 3D Scene Understanding

Author: Chen Yujin
Dai Angela
Nießner Matthias
Publication venue
Publication date: 22/07/2022
Field of study

We present a new approach to instill 4D dynamic object priors into learned 3D representations by unsupervised pre-training. We observe that dynamic movement of an object through an environment provides important cues about its objectness, and thus propose to imbue learned 3D representations with such dynamic understanding, that can then be effectively transferred to improved performance in downstream 3D semantic scene understanding tasks. We propose a new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D environments, and employ contrastive learning under 3D-4D constraints that encode 4D invariances into the learned 3D representations. Experiments demonstrate that our unsupervised representation learning results in improvement in downstream 3D semantic segmentation, object detection, and instance segmentation tasks, and moreover, notably improves performance in data-scarce scenarios.Comment: Accepted by ECCV 2022, Video: https://youtu.be/qhGhWZmJq3

arXiv.org e-Print Archive

A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation

Author: Han Liming
Hua Minjie
Huang Bill
Lian Shiguo
Lin Yimin
Wang Kai
Wang Luowei
Wang Xiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/03/2019
Field of study

This paper presents a novel framework for simultaneously implementing localization and segmentation, which are two of the most important vision-based tasks for robotics. While the goals and techniques used for them were considered to be different previously, we show that by making use of the intermediate results of the two modules, their performance can be enhanced at the same time. Our framework is able to handle both the instantaneous motion and long-term changes of instances in localization with the help of the segmentation result, which also benefits from the refined 3D pose information. We conduct experiments on various datasets, and prove that our framework works effectively on improving the precision and robustness of the two tasks and outperforms existing localization and segmentation algorithms.Comment: 7 pages, 5 figures.This work has been accepted by ICRA 2019. The demo video can be found at https://youtu.be/Bkt53dAehj

arXiv.org e-Print Archive

Crossref

Instance Neural Radiance Field

Author: Hu Benran
Huang Junkai
Liu Yichen
Tai Yu-Wing
Tang Chi-Keung
Publication venue
Publication date: 10/04/2023
Field of study

This paper presents one of the first learning-based NeRF 3D instance segmentation pipelines, dubbed as Instance Neural Radiance Field, or Instance NeRF. Taking a NeRF pretrained from multi-view RGB images as input, Instance NeRF can learn 3D instance segmentation of a given scene, represented as an instance field component of the NeRF model. To this end, we adopt a 3D proposal-based mask prediction network on the sampled volumetric features from NeRF, which generates discrete 3D instance masks. The coarse 3D mask prediction is then projected to image space to match 2D segmentation masks from different views generated by existing panoptic segmentation models, which are used to supervise the training of the instance field. Notably, beyond generating consistent 2D segmentation maps from novel views, Instance NeRF can query instance information at any 3D point, which greatly enhances NeRF object segmentation and manipulation. Our method is also one of the first to achieve such results without ground-truth instance information during inference. Experimented on synthetic and real-world NeRF datasets with complex indoor scenes, Instance NeRF surpasses previous NeRF segmentation works and competitive 2D segmentation methods in segmentation performance on unseen views. See the demo video at https://youtu.be/wW9Bme73coI

arXiv.org e-Print Archive