2 research outputs found
One-Shot Weakly Supervised Video Object Segmentation
Conventional few-shot object segmentation methods learn object segmentation
from a few labelled support images with strongly labelled segmentation masks.
Recent work has shown to perform on par with weaker levels of supervision in
terms of scribbles and bounding boxes. However, there has been limited
attention given to the problem of few-shot object segmentation with image-level
supervision. We propose a novel multi-modal interaction module for few-shot
object segmentation that utilizes a co-attention mechanism using both visual
and word embeddings. It enables our model to achieve 5.1% improvement over
previously proposed image-level few-shot object segmentation. Our method
compares relatively close to the state of the art methods that use strong
supervision, while ours use the least possible supervision. We further propose
a novel setup for few-shot weakly supervised video object segmentation(VOS)
that relies on image-level labels for the first frame. The proposed setup uses
weak annotation unlike semi-supervised VOS setting that utilizes strongly
labelled segmentation masks. The setup evaluates the effectiveness of
generalizing to novel classes in the VOS setting. The setup splits the VOS data
into multiple folds with different categories per fold. It provides a potential
setup to evaluate how few-shot object segmentation methods can benefit from
additional object poses, or object interactions that is not available in static
frames as in PASCAL-5i benchmark
Repurposing GANs for One-shot Semantic Part Segmentation
While GANs have shown success in realistic image generation, the idea of
using GANs for other tasks unrelated to synthesis is underexplored. Do GANs
learn meaningful structural parts of objects during their attempt to reproduce
those objects? In this work, we test this hypothesis and propose a simple and
effective approach based on GANs for semantic part segmentation that requires
as few as one label example along with an unlabeled dataset. Our key idea is to
leverage a trained GAN to extract pixel-wise representation from the input
image and use it as feature vectors for a segmentation network. Our experiments
demonstrate that GANs representation is "readily discriminative" and produces
surprisingly good results that are comparable to those from supervised
baselines trained with significantly more labels. We believe this novel
repurposing of GANs underlies a new class of unsupervised representation
learning that is applicable to many other tasks. More results are available at
https://repurposegans.github.io/.Comment: CVPR 2021 (Oral