99 research outputs found
Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion
Given a group of images, co-salient object detection (CoSOD) aims to
highlight the common salient object in each image. There are two factors
closely related to the success of this task, namely consensus extraction, and
the dispersion of consensus to each image. Most previous works represent the
group consensus using local features, while we instead utilize a hierarchical
Transformer module for extracting semantic-level consensus. Therefore, it can
obtain a more comprehensive representation of the common object category, and
exclude interference from other objects that share local similarities with the
target object. In addition, we propose a Transformer-based dispersion module
that takes into account the variation of the co-salient object in different
scenes. It distributes the consensus to the image feature maps in an
image-specific way while making full use of interactions within the group.
These two modules are integrated with a ViT encoder and an FPN-like decoder to
form an end-to-end trainable network, without additional branch and auxiliary
loss. The proposed method is evaluated on three commonly used CoSOD datasets
and achieves state-of-the-art performance.Comment: Accepted by ACM MM 202
Distributed Low-rank Subspace Segmentation
Vision problems ranging from image clustering to motion segmentation to
semi-supervised learning can naturally be framed as subspace segmentation
problems, in which one aims to recover multiple low-dimensional subspaces from
noisy and corrupted input data. Low-Rank Representation (LRR), a convex
formulation of the subspace segmentation problem, is provably and empirically
accurate on small problems but does not scale to the massive sizes of modern
vision datasets. Moreover, past work aimed at scaling up low-rank matrix
factorization is not applicable to LRR given its non-decomposable constraints.
In this work, we propose a novel divide-and-conquer algorithm for large-scale
subspace segmentation that can cope with LRR's non-decomposable constraints and
maintains LRR's strong recovery guarantees. This has immediate implications for
the scalability of subspace segmentation, which we demonstrate on a benchmark
face recognition dataset and in simulations. We then introduce novel
applications of LRR-based subspace segmentation to large-scale semi-supervised
learning for multimedia event detection, concept detection, and image tagging.
In each case, we obtain state-of-the-art results and order-of-magnitude speed
ups
- …