7,379 research outputs found
Fine-grained Domain Adaptive Crowd Counting via Point-derived Segmentation
Due to domain shift, a large performance drop is usually observed when a
trained crowd counting model is deployed in the wild. While existing
domain-adaptive crowd counting methods achieve promising results, they
typically regard each crowd image as a whole and reduce domain discrepancies in
a holistic manner, thus limiting further improvement of domain adaptation
performance. To this end, we propose to untangle \emph{domain-invariant} crowd
and \emph{domain-specific} background from crowd images and design a
fine-grained domain adaption method for crowd counting. Specifically, to
disentangle crowd from background, we propose to learn crowd segmentation from
point-level crowd counting annotations in a weakly-supervised manner. Based on
the derived segmentation, we design a crowd-aware domain adaptation mechanism
consisting of two crowd-aware adaptation modules, i.e., Crowd Region Transfer
(CRT) and Crowd Density Alignment (CDA). The CRT module is designed to guide
crowd features transfer across domains beyond background distractions. The CDA
module dedicates to regularising target-domain crowd density generation by its
own crowd density distribution. Our method outperforms previous approaches
consistently in the widely-used adaptation scenarios.Comment: 10 pages, 5 figures, and 9 table
Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes
To alleviate the heavy annotation burden for training a reliable crowd
counting model and thus make the model more practicable and accurate by being
able to benefit from more data, this paper presents a new semi-supervised
method based on the mean teacher framework. When there is a scarcity of labeled
data available, the model is prone to overfit local patches. Within such
contexts, the conventional approach of solely improving the accuracy of local
patch predictions through unlabeled data proves inadequate. Consequently, we
propose a more nuanced approach: fostering the model's intrinsic 'subitizing'
capability. This ability allows the model to accurately estimate the count in
regions by leveraging its understanding of the crowd scenes, mirroring the
human cognitive process. To achieve this goal, we apply masking on unlabeled
data, guiding the model to make predictions for these masked patches based on
the holistic cues. Furthermore, to help with feature learning, herein we
incorporate a fine-grained density classification task. Our method is general
and applicable to most existing crowd counting methods as it doesn't have
strict structural or loss constraints. In addition, we observe that the model
trained with our framework exhibits a 'subitizing'-like behavior. It accurately
predicts low-density regions with only a 'glance', while incorporating local
details to predict high-density regions. Our method achieves the
state-of-the-art performance, surpassing previous approaches by a large margin
on challenging benchmarks such as ShanghaiTech A and UCF-QNRF. The code is
available at: https://github.com/cha15yq/MRC-Crowd
Counting with Focus for Free
This paper aims to count arbitrary objects in images. The leading counting
approaches start from point annotations per object from which they construct
density maps. Then, their training objective transforms input images to density
maps through deep convolutional networks. We posit that the point annotations
serve more supervision purposes than just constructing density maps. We
introduce ways to repurpose the points for free. First, we propose supervised
focus from segmentation, where points are converted into binary maps. The
binary maps are combined with a network branch and accompanying loss function
to focus on areas of interest. Second, we propose supervised focus from global
density, where the ratio of point annotations to image pixels is used in
another branch to regularize the overall density estimation. To assist both the
density estimation and the focus from segmentation, we also introduce an
improved kernel size estimator for the point annotations. Experiments on six
datasets show that all our contributions reduce the counting error, regardless
of the base network, resulting in state-of-the-art accuracy using only a single
network. Finally, we are the first to count on WIDER FACE, allowing us to show
the benefits of our approach in handling varying object scales and crowding
levels. Code is available at
https://github.com/shizenglin/Counting-with-Focus-for-FreeComment: ICCV, 201
SIMCO: SIMilarity-based object COunting
We present SIMCO, the first agnostic multi-class object counting approach.
SIMCO starts by detecting foreground objects through a novel Mask RCNN-based
architecture trained beforehand (just once) on a brand-new synthetic 2D shape
dataset, InShape; the idea is to highlight every object resembling a primitive
2D shape (circle, square, rectangle, etc.). Each object detected is described
by a low-dimensional embedding, obtained from a novel similarity-based head
branch; this latter implements a triplet loss, encouraging similar objects
(same 2D shape + color and scale) to map close. Subsequently, SIMCO uses this
embedding for clustering, so that different types of objects can emerge and be
counted, making SIMCO the very first multi-class unsupervised counter.
Experiments show that SIMCO provides state-of-the-art scores on counting
benchmarks and that it can also help in many challenging image understanding
tasks
- …