695 research outputs found
CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition
Audio-visual person recognition (AVPR) has received extensive attention.
However, most datasets used for AVPR research so far are collected in
constrained environments, and thus cannot reflect the true performance of AVPR
systems in real-world scenarios. To meet the request for research on AVPR in
unconstrained conditions, this paper presents a multi-genre AVPR dataset
collected `in the wild', named CN-Celeb-AV. This dataset contains more than
419k video segments from 1,136 persons from public media. In particular, we put
more emphasis on two real-world complexities: (1) data in multiple genres; (2)
segments with partial information. A comprehensive study was conducted to
compare CN-Celeb-AV with two popular public AVPR benchmark datasets, and the
results demonstrated that CN-Celeb-AV is more in line with real-world scenarios
and can be regarded as a new benchmark dataset for AVPR research. The dataset
also involves a development set that can be used to boost the performance of
AVPR systems in real-life situations. The dataset is free for researchers and
can be downloaded from http://cnceleb.org/.Comment: INTERSPEECH 202
Identity Verification Using Speech and Face Information
This article first provides an review of important concepts in the field of information fusion, followed by a review of important milestones in audio–visual person identification and verification. Several recent adaptive and nonadaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the nonadaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions
- …