Search CORE

1,206 research outputs found

Does Banking Concentration Lead to Banking Stability in the CEE Countries?

Author: Yu Yingying
Publication venue: Univerzita Karlova, Fakulta sociálních věd
Publication date: 01/01/2014
Field of study

Katedra ruských a východoevropských studiíDepartment of Russian and East European StudiesFaculty of Social SciencesFakulta sociálních vě

CU Digital Repository

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Author: Du Bo
Fan Yingying
Lin Yutian
Wu Yu
Publication venue
Publication date: 01/06/2023
Field of study

We focus on the weakly-supervised audio-visual video parsing task (AVVP), which aims to identify and locate all the events in audio/visual modalities. Previous works only concentrate on video-level overall label denoising across modalities, but overlook the segment-level label noise, where adjacent video segments (i.e., 1-second video clips) may contain different events. However, recognizing events in the segment is challenging because its label could be any combination of events that occur in the video. To address this issue, we consider tackling AVVP from the language perspective, since language could freely describe how various events appear in each segment beyond fixed labels. Specifically, we design language prompts to describe all cases of event appearance for each video. Then, the similarity between language prompts and segments is calculated, where the event of the most similar prompt is regarded as the segment-level label. In addition, to deal with the mislabeled segments, we propose to perform dynamic re-weighting on the unreliable segments to adjust their labels. Experiments show that our simple yet effective approach outperforms state-of-the-art methods by a large margin

arXiv.org e-Print Archive

Focal Inverse Distance Transform Maps for Crowd Localization and Counting in Dense Crowd

Author: Liang Dingkang
Xu Wei
Zhou Yu
Zhu Yingying
Publication venue
Publication date: 18/03/2021
Field of study

In this paper, we propose a novel map for dense crowd localization and crowd counting. Most crowd counting methods utilize convolution neural networks (CNN) to regress a density map, achieving significant progress recently. However, these regression-based methods are often unable to provide a precise location for each person, attributed to two crucial reasons: 1) the density map consists of a series of blurry Gaussian blobs, 2) severe overlaps exist in the dense region of the density map. To tackle this issue, we propose a novel Focal Inverse Distance Transform (FIDT) map for crowd localization and counting. Compared with the density maps, the FIDT maps accurately describe the people's location, without overlap between nearby heads in dense regions. We simultaneously implement crowd localization and counting by regressing the FIDT map. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art localization-based methods in crowd localization tasks, achieving very competitive performance compared with the regression-based methods in counting tasks. In addition, the proposed method presents strong robustness for the negative samples and extremely dense scenes, which further verifies the effectiveness of the FIDT map. The code and models are available at https://github.com/dk-liang/FIDTM.Comment: The code and models are available at https://github.com/dk-liang/FIDT

arXiv.org e-Print Archive