1,206 research outputs found
Does Banking Concentration Lead to Banking Stability in the CEE Countries?
Katedra ruských a východoevropských studiíDepartment of Russian and East European StudiesFaculty of Social SciencesFakulta sociálních vě
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
We focus on the weakly-supervised audio-visual video parsing task (AVVP),
which aims to identify and locate all the events in audio/visual modalities.
Previous works only concentrate on video-level overall label denoising across
modalities, but overlook the segment-level label noise, where adjacent video
segments (i.e., 1-second video clips) may contain different events. However,
recognizing events in the segment is challenging because its label could be any
combination of events that occur in the video. To address this issue, we
consider tackling AVVP from the language perspective, since language could
freely describe how various events appear in each segment beyond fixed labels.
Specifically, we design language prompts to describe all cases of event
appearance for each video. Then, the similarity between language prompts and
segments is calculated, where the event of the most similar prompt is regarded
as the segment-level label. In addition, to deal with the mislabeled segments,
we propose to perform dynamic re-weighting on the unreliable segments to adjust
their labels. Experiments show that our simple yet effective approach
outperforms state-of-the-art methods by a large margin
Focal Inverse Distance Transform Maps for Crowd Localization and Counting in Dense Crowd
In this paper, we propose a novel map for dense crowd localization and crowd
counting. Most crowd counting methods utilize convolution neural networks (CNN)
to regress a density map, achieving significant progress recently. However,
these regression-based methods are often unable to provide a precise location
for each person, attributed to two crucial reasons: 1) the density map consists
of a series of blurry Gaussian blobs, 2) severe overlaps exist in the dense
region of the density map. To tackle this issue, we propose a novel Focal
Inverse Distance Transform (FIDT) map for crowd localization and counting.
Compared with the density maps, the FIDT maps accurately describe the people's
location, without overlap between nearby heads in dense regions. We
simultaneously implement crowd localization and counting by regressing the FIDT
map. Extensive experiments demonstrate that the proposed method outperforms
state-of-the-art localization-based methods in crowd localization tasks,
achieving very competitive performance compared with the regression-based
methods in counting tasks. In addition, the proposed method presents strong
robustness for the negative samples and extremely dense scenes, which further
verifies the effectiveness of the FIDT map. The code and models are available
at https://github.com/dk-liang/FIDTM.Comment: The code and models are available at
https://github.com/dk-liang/FIDT
- …