4,512 research outputs found
Progressively Dual Prior Guided Few-shot Semantic Segmentation
Few-shot semantic segmentation task aims at performing segmentation in query
images with a few annotated support samples. Currently, few-shot segmentation
methods mainly focus on leveraging foreground information without fully
utilizing the rich background information, which could result in wrong
activation of foreground-like background regions with the inadaptability to
dramatic scene changes of support-query image pairs. Meanwhile, the lack of
detail mining mechanism could cause coarse parsing results without some
semantic components or edge areas since prototypes have limited ability to cope
with large object appearance variance. To tackle these problems, we propose a
progressively dual prior guided few-shot semantic segmentation network.
Specifically, a dual prior mask generation (DPMG) module is firstly designed to
suppress the wrong activation in foreground-background comparison manner by
regarding background as assisted refinement information. With dual prior masks
refining the location of foreground area, we further propose a progressive
semantic detail enrichment (PSDE) module which forces the parsing model to
capture the hidden semantic details by iteratively erasing the high-confidence
foreground region and activating details in the rest region with a hierarchical
structure. The collaboration of DPMG and PSDE formulates a novel few-shot
segmentation network that can be learned in an end-to-end manner. Comprehensive
experiments on PASCAL-5i and MS COCO powerfully demonstrate that our proposed
algorithm achieves the great performance
MMPL-Net: Multi-modal prototype learning for one-shot RGB-D segmentation
For one-shot segmentation, prototype learning is extensively used. However, using only one RGB prototype to represent all information in the support image may lead to ambiguities. To this end, we propose a one-shot segmentation network based on multi-modal prototype learning that uses depth information to complement RGB information. Specifically, we propose a multi-modal fusion and refinement block (MFRB) and multi-modal prototype learning block (MPLB). MFRB fuses RGB and depth features to generate multi-modal features and refined depth features, which are used by MPLB, to generate multi-modal information prototypes, depth information prototypes, and global information prototypes. Furthermore, we introduce self-attention to capture global context information in RGB and depth images. By integrating self-attention, MFRB, and MPLB, we propose the multi-modal prototype learning network (MMPL-Net), which adapts to the ambiguity of visual information in the scene. Finally, we construct a one-shot RGB-D segmentation dataset called OSS-RGB-D-5i. Experiments using OSS-RGB-D-5i show that our proposed method outperforms several state-of-the-art techniques with fewer labeled images and generalizes well to previously unseen objects.</p
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
- …