51,691 research outputs found
Depth map compression via 3D region-based representation
In 3D video, view synthesis is used to create new virtual views between
encoded camera views. Errors in the coding of the depth maps introduce
geometry inconsistencies in synthesized views. In this paper, a new 3D plane
representation of the scene is presented which improves the performance of
current standard video codecs in the view synthesis domain. Two image segmentation
algorithms are proposed for generating a color and depth segmentation.
Using both partitions, depth maps are segmented into regions without
sharp discontinuities without having to explicitly signal all depth edges. The
resulting regions are represented using a planar model in the 3D world scene.
This 3D representation allows an efficient encoding while preserving the 3D
characteristics of the scene. The 3D planes open up the possibility to code
multiview images with a unique representation.Postprint (author's final draft
Geodesic Distance Histogram Feature for Video Segmentation
This paper proposes a geodesic-distance-based feature that encodes global
information for improved video segmentation algorithms. The feature is a joint
histogram of intensity and geodesic distances, where the geodesic distances are
computed as the shortest paths between superpixels via their boundaries. We
also incorporate adaptive voting weights and spatial pyramid configurations to
include spatial information into the geodesic histogram feature and show that
this further improves results. The feature is generic and can be used as part
of various algorithms. In experiments, we test the geodesic histogram feature
by incorporating it into two existing video segmentation frameworks. This leads
to significantly better performance in 3D video segmentation benchmarks on two
datasets
3D video performance segmentation
We present a novel approach that achieves segmentation of subject body parts in 3D videos. 3D video consists in a free-viewpoint video of real-world subjects in motion immersed in a virtual world. Each 3D video frame is composed of one or several 3D models. A topology dictionary is used to cluster 3D video sequences with respect to the model topology and shape. The topology is characterized using Reeb graph-based descriptors and no prior explicit model on the subject shape is necessary to perform the clustering process. In this frame-work, the dictionary consists in a set of training input poses with a priori segmentation and labels. As a consequence, all identified frames of 3D video sequences can be automatically segmented. Finally, motion flows computed between consec-utive frames are used to transfer segmented region labels to unidentified frames. Our method allows us to perform robust body part segmentation and tracking in 3D cinema sequences. Index Terms — 3D video, topology dictionary, shape matching, body segmentation 1
Making a Case for 3D Convolutions for Object Segmentation in Videos
The task of object segmentation in videos is usually accomplished by
processing appearance and motion information separately using standard 2D
convolutional networks, followed by a learned fusion of the two sources of
information. On the other hand, 3D convolutional networks have been
successfully applied for video classification tasks, but have not been
leveraged as effectively to problems involving dense per-pixel interpretation
of videos compared to their 2D convolutional counterparts and lag behind the
aforementioned networks in terms of performance. In this work, we show that 3D
CNNs can be effectively applied to dense video prediction tasks such as salient
object segmentation. We propose a simple yet effective encoder-decoder network
architecture consisting entirely of 3D convolutions that can be trained
end-to-end using a standard cross-entropy loss. To this end, we leverage an
efficient 3D encoder, and propose a 3D decoder architecture, that comprises
novel 3D Global Convolution layers and 3D Refinement modules. Our approach
outperforms existing state-of-the-arts by a large margin on the DAVIS'16
Unsupervised, FBMS and ViSal dataset benchmarks in addition to being faster,
thus showing that our architecture can efficiently learn expressive
spatio-temporal features and produce high quality video segmentation masks. Our
code and models will be made publicly available.Comment: BMVC '2
4DContrast: Contrastive Learning with Dynamic Correspondences for 3D Scene Understanding
We present a new approach to instill 4D dynamic object priors into learned 3D
representations by unsupervised pre-training. We observe that dynamic movement
of an object through an environment provides important cues about its
objectness, and thus propose to imbue learned 3D representations with such
dynamic understanding, that can then be effectively transferred to improved
performance in downstream 3D semantic scene understanding tasks. We propose a
new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D
environments, and employ contrastive learning under 3D-4D constraints that
encode 4D invariances into the learned 3D representations. Experiments
demonstrate that our unsupervised representation learning results in
improvement in downstream 3D semantic segmentation, object detection, and
instance segmentation tasks, and moreover, notably improves performance in
data-scarce scenarios.Comment: Accepted by ECCV 2022, Video: https://youtu.be/qhGhWZmJq3
A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation
This paper presents a novel framework for simultaneously implementing
localization and segmentation, which are two of the most important vision-based
tasks for robotics. While the goals and techniques used for them were
considered to be different previously, we show that by making use of the
intermediate results of the two modules, their performance can be enhanced at
the same time. Our framework is able to handle both the instantaneous motion
and long-term changes of instances in localization with the help of the
segmentation result, which also benefits from the refined 3D pose information.
We conduct experiments on various datasets, and prove that our framework works
effectively on improving the precision and robustness of the two tasks and
outperforms existing localization and segmentation algorithms.Comment: 7 pages, 5 figures.This work has been accepted by ICRA 2019. The demo
video can be found at https://youtu.be/Bkt53dAehj
Instance Neural Radiance Field
This paper presents one of the first learning-based NeRF 3D instance
segmentation pipelines, dubbed as Instance Neural Radiance Field, or Instance
NeRF. Taking a NeRF pretrained from multi-view RGB images as input, Instance
NeRF can learn 3D instance segmentation of a given scene, represented as an
instance field component of the NeRF model. To this end, we adopt a 3D
proposal-based mask prediction network on the sampled volumetric features from
NeRF, which generates discrete 3D instance masks. The coarse 3D mask prediction
is then projected to image space to match 2D segmentation masks from different
views generated by existing panoptic segmentation models, which are used to
supervise the training of the instance field. Notably, beyond generating
consistent 2D segmentation maps from novel views, Instance NeRF can query
instance information at any 3D point, which greatly enhances NeRF object
segmentation and manipulation. Our method is also one of the first to achieve
such results without ground-truth instance information during inference.
Experimented on synthetic and real-world NeRF datasets with complex indoor
scenes, Instance NeRF surpasses previous NeRF segmentation works and
competitive 2D segmentation methods in segmentation performance on unseen
views. See the demo video at https://youtu.be/wW9Bme73coI
- …