4 research outputs found
3D Indoor Instance Segmentation in an Open-World
Existing 3D instance segmentation methods typically assume that all semantic
classes to be segmented would be available during training and only seen
categories are segmented at inference. We argue that such a closed-world
assumption is restrictive and explore for the first time 3D indoor instance
segmentation in an open-world setting, where the model is allowed to
distinguish a set of known classes as well as identify an unknown object as
unknown and then later incrementally learning the semantic category of the
unknown when the corresponding category labels are available. To this end, we
introduce an open-world 3D indoor instance segmentation method, where an
auto-labeling scheme is employed to produce pseudo-labels during training and
induce separation to separate known and unknown category labels. We further
improve the pseudo-labels quality at inference by adjusting the unknown class
probability based on the objectness score distribution. We also introduce
carefully curated open-world splits leveraging realistic scenarios based on
inherent object distribution, region-based indoor scene exploration and
randomness aspect of open-world classes. Extensive experiments reveal the
efficacy of the proposed contributions leading to promising open-world 3D
instance segmentation performance.Comment: Accepted at NeurIPS 202
Analogy-Forming Transformers for Few-Shot 3D Parsing
We present Analogical Networks, a model that encodes domain knowledge
explicitly, in a collection of structured labelled 3D scenes, in addition to
implicitly, as model parameters, and segments 3D object scenes with analogical
reasoning: instead of mapping a scene to part segments directly, our model
first retrieves related scenes from memory and their corresponding part
structures, and then predicts analogous part structures for the input scene,
via an end-to-end learnable modulation mechanism. By conditioning on more than
one retrieved memories, compositions of structures are predicted, that mix and
match parts across the retrieved memories. One-shot, few-shot or many-shot
learning are treated uniformly in Analogical Networks, by conditioning on the
appropriate set of memories, whether taken from a single, few or many memory
exemplars, and inferring analogous parses. We show Analogical Networks are
competitive with state-of-the-art 3D segmentation transformers in many-shot
settings, and outperform them, as well as existing paradigms of meta-learning
and few-shot learning, in few-shot settings. Analogical Networks successfully
segment instances of novel object categories simply by expanding their memory,
without any weight updates. Our code and models are publicly available in the
project webpage: http://analogicalnets.github.io/.Comment: ICLR 202
End-to-End Supervised Multilabel Contrastive Learning
Multilabel representation learning is recognized as a challenging problem
that can be associated with either label dependencies between object categories
or data-related issues such as the inherent imbalance of positive/negative
samples. Recent advances address these challenges from model- and data-centric
viewpoints. In model-centric, the label correlation is obtained by an external
model designs (e.g., graph CNN) to incorporate an inductive bias for training.
However, they fail to design an end-to-end training framework, leading to high
computational complexity. On the contrary, in data-centric, the realistic
nature of the dataset is considered for improving the classification while
ignoring the label dependencies. In this paper, we propose a new end-to-end
training framework -- dubbed KMCL (Kernel-based Mutlilabel Contrastive
Learning) -- to address the shortcomings of both model- and data-centric
designs. The KMCL first transforms the embedded features into a mixture of
exponential kernels in Gaussian RKHS. It is then followed by encoding an
objective loss that is comprised of (a) reconstruction loss to reconstruct
kernel representation, (b) asymmetric classification loss to address the
inherent imbalance problem, and (c) contrastive loss to capture label
correlation. The KMCL models the uncertainty of the feature encoder while
maintaining a low computational footprint. Extensive experiments are conducted
on image classification tasks to showcase the consistent improvements of KMCL
over the SOTA methods. PyTorch implementation is provided in
\url{https://github.com/mahdihosseini/KMCL}
A Review of Panoptic Segmentation for Mobile Mapping Point Clouds
3D point cloud panoptic segmentation is the combined task to (i) assign each
point to a semantic class and (ii) separate the points in each class into
object instances. Recently there has been an increased interest in such
comprehensive 3D scene understanding, building on the rapid advances of
semantic segmentation due to the advent of deep 3D neural networks. Yet, to
date there is very little work about panoptic segmentation of outdoor
mobile-mapping data, and no systematic comparisons. The present paper tries to
close that gap. It reviews the building blocks needed to assemble a panoptic
segmentation pipeline and the related literature. Moreover, a modular pipeline
is set up to perform comprehensive, systematic experiments to assess the state
of panoptic segmentation in the context of street mapping. As a byproduct, we
also provide the first public dataset for that task, by extending the NPM3D
dataset to include instance labels. That dataset and our source code are
publicly available. We discuss which adaptations are need to adapt current
panoptic segmentation methods to outdoor scenes and large objects. Our study
finds that for mobile mapping data, KPConv performs best but is slower, while
PointNet++ is fastest but performs significantly worse. Sparse CNNs are in
between. Regardless of the backbone, Instance segmentation by clustering
embedding features is better than using shifted coordinates