155 research outputs found
On the Use of Speech and Face Information for Identity Verification
{T}his report first provides a review of important concepts in the field of information fusion, followed by a review of important milestones in audio-visual person identification and verification. {S}everal recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the non-adaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions. {T}his report is an extended and revised version of {IDIAP-RR} 02-33
Information Fusion and Person Verification Using Speech & Face Information
This report provides an overview of important concepts in the field of information fusion, followed by a review of literature pertaining to audio-visual person identification & verification. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on audio and visual information, are evaluated in clean and noisy conditions on a common database using a text-independent setup. It is shown that in clean conditions all the non-adaptive approaches provide similar performance; in noisy conditions they exhibit deterioration in their performance. It is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision surface is fixed but constructed to take into account the effects of noisy conditions, providing a good trade-off between performance in clean and noisy conditions
Deep Residual-Dense Lattice Network for Speech Enhancement
Convolutional neural networks (CNNs) with residual links (ResNets) and causal
dilated convolutional units have been the network of choice for deep learning
approaches to speech enhancement. While residual links improve gradient flow
during training, feature diminution of shallow layer outputs can occur due to
repetitive summations with deeper layer outputs. One strategy to improve
feature re-usage is to fuse both ResNets and densely connected CNNs
(DenseNets). DenseNets, however, over-allocate parameters for feature re-usage.
Motivated by this, we propose the residual-dense lattice network (RDL-Net),
which is a new CNN for speech enhancement that employs both residual and dense
aggregations without over-allocating parameters for feature re-usage. This is
managed through the topology of the RDL blocks, which limit the number of
outputs used for dense aggregations. Our extensive experimental investigation
shows that RDL-Nets are able to achieve a higher speech enhancement performance
than CNNs that employ residual and/or dense aggregations. RDL-Nets also use
substantially fewer parameters and have a lower computational requirement.
Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep
learning approaches to speech enhancement.Comment: 8 pages, Accepted by AAAI-202
- …