3 research outputs found
Feature-aware Adaptation and Structured Density Alignment for Crowd Counting in Video Surveillance
With the development of deep neural networks, the performance of crowd
counting and pixel-wise density estimation are continually being refreshed.
Despite this, there are still two challenging problems in this field: 1)
current supervised learning needs a large amount of training data, but
collecting and annotating them is difficult; 2) existing methods can not
generalize well to the unseen domain. A recently released synthetic crowd
dataset alleviates these two problems. However, the domain gap between the
real-world data and synthetic images decreases the models' performance. To
reduce the gap, in this paper, we propose a domain-adaptation-style crowd
counting method, which can effectively adapt the model from synthetic data to
the specific real-world scenes. It consists of Multi-level Feature-aware
Adaptation (MFA) and Structured Density map Alignment (SDA). To be specific,
MFA boosts the model to extract domain-invariant features from multiple layers.
SDA guarantees the network outputs fine density maps with a reasonable
distribution on the real domain. Finally, we evaluate the proposed method on
four mainstream surveillance crowd datasets, Shanghai Tech Part B,
WorldExpo'10, Mall and UCSD. Extensive experiments evidence that our approach
outperforms the state-of-the-art methods for the same cross-domain counting
problem
A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in Aerial View
Drones shooting can be applied in dynamic traffic monitoring, object
detecting and tracking, and other vision tasks. The variability of the shooting
location adds some intractable challenges to these missions, such as varying
scale, unstable exposure, and scene migration. In this paper, we strive to
tackle the above challenges and automatically understand the crowd from the
visual data collected from drones. First, to alleviate the background noise
generated in cross-scene testing, a double-stream crowd counting model is
proposed, which extracts optical flow and frame difference information as an
additional branch. Besides, to improve the model's generalization ability at
different scales and time, we randomly combine a variety of data transformation
methods to simulate some unseen environments. To tackle the crowd density
estimation problem under extreme dark environments, we introduce synthetic data
generated by game Grand Theft Auto V(GTAV). Experiment results show the
effectiveness of the virtual data. Our method wins the challenge with a mean
absolute error (MAE) of 12.70. Moreover, a comprehensive ablation study is
conducted to explore each component's contribution
Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting
Cross-domain crowd counting (CDCC) is a hot topic due to its importance in
public safety. The purpose of CDCC is to alleviate the domain shift between the
source and target domain. Recently, typical methods attempt to extract
domain-invariant features via image translation and adversarial learning. When
it comes to specific tasks, we find that the domain shifts are reflected on
model parameters' differences. To describe the domain gap directly at the
parameter-level, we propose a Neuron Linear Transformation (NLT) method,
exploiting domain factor and bias weights to learn the domain shift.
Specifically, for a specific neuron of a source model, NLT exploits few labeled
target data to learn domain shift parameters. Finally, the target neuron is
generated via a linear transformation. Extensive experiments and analysis on
six real-world datasets validate that NLT achieves top performance compared
with other domain adaptation methods. An ablation study also shows that the NLT
is robust and more effective than supervised and fine-tune training. Code is
available at: \url{https://github.com/taohan10200/NLT}.Comment: accepted by IEEE T-NNL