563 research outputs found
Counting with Focus for Free
This paper aims to count arbitrary objects in images. The leading counting
approaches start from point annotations per object from which they construct
density maps. Then, their training objective transforms input images to density
maps through deep convolutional networks. We posit that the point annotations
serve more supervision purposes than just constructing density maps. We
introduce ways to repurpose the points for free. First, we propose supervised
focus from segmentation, where points are converted into binary maps. The
binary maps are combined with a network branch and accompanying loss function
to focus on areas of interest. Second, we propose supervised focus from global
density, where the ratio of point annotations to image pixels is used in
another branch to regularize the overall density estimation. To assist both the
density estimation and the focus from segmentation, we also introduce an
improved kernel size estimator for the point annotations. Experiments on six
datasets show that all our contributions reduce the counting error, regardless
of the base network, resulting in state-of-the-art accuracy using only a single
network. Finally, we are the first to count on WIDER FACE, allowing us to show
the benefits of our approach in handling varying object scales and crowding
levels. Code is available at
https://github.com/shizenglin/Counting-with-Focus-for-FreeComment: ICCV, 201
Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes
To alleviate the heavy annotation burden for training a reliable crowd
counting model and thus make the model more practicable and accurate by being
able to benefit from more data, this paper presents a new semi-supervised
method based on the mean teacher framework. When there is a scarcity of labeled
data available, the model is prone to overfit local patches. Within such
contexts, the conventional approach of solely improving the accuracy of local
patch predictions through unlabeled data proves inadequate. Consequently, we
propose a more nuanced approach: fostering the model's intrinsic 'subitizing'
capability. This ability allows the model to accurately estimate the count in
regions by leveraging its understanding of the crowd scenes, mirroring the
human cognitive process. To achieve this goal, we apply masking on unlabeled
data, guiding the model to make predictions for these masked patches based on
the holistic cues. Furthermore, to help with feature learning, herein we
incorporate a fine-grained density classification task. Our method is general
and applicable to most existing crowd counting methods as it doesn't have
strict structural or loss constraints. In addition, we observe that the model
trained with our framework exhibits a 'subitizing'-like behavior. It accurately
predicts low-density regions with only a 'glance', while incorporating local
details to predict high-density regions. Our method achieves the
state-of-the-art performance, surpassing previous approaches by a large margin
on challenging benchmarks such as ShanghaiTech A and UCF-QNRF. The code is
available at: https://github.com/cha15yq/MRC-Crowd
TreeFormer: a Semi-Supervised Transformer-based Framework for Tree Counting from a Single High Resolution Image
Automatic tree density estimation and counting using single aerial and
satellite images is a challenging task in photogrammetry and remote sensing,
yet has an important role in forest management. In this paper, we propose the
first semisupervised transformer-based framework for tree counting which
reduces the expensive tree annotations for remote sensing images. Our method,
termed as TreeFormer, first develops a pyramid tree representation module based
on transformer blocks to extract multi-scale features during the encoding
stage. Contextual attention-based feature fusion and tree density regressor
modules are further designed to utilize the robust features from the encoder to
estimate tree density maps in the decoder. Moreover, we propose a pyramid
learning strategy that includes local tree density consistency and local tree
count ranking losses to utilize unlabeled images into the training process.
Finally, the tree counter token is introduced to regulate the network by
computing the global tree counts for both labeled and unlabeled images. Our
model was evaluated on two benchmark tree counting datasets, Jiangsu, and
Yosemite, as well as a new dataset, KCL-London, created by ourselves. Our
TreeFormer outperforms the state of the art semi-supervised methods under the
same setting and exceeds the fully-supervised methods using the same number of
labeled images. The codes and datasets are available at
https://github.com/HAAClassic/TreeFormer.Comment: Accepted in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN
- …