155 research outputs found

    Adaptive Multi-Modal Person Verification System

    Get PDF

    On the Use of Speech and Face Information for Identity Verification

    Get PDF
    {T}his report first provides a review of important concepts in the field of information fusion, followed by a review of important milestones in audio-visual person identification and verification. {S}everal recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the non-adaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions. {T}his report is an extended and revised version of {IDIAP-RR} 02-33

    Information Fusion and Person Verification Using Speech & Face Information

    Get PDF
    This report provides an overview of important concepts in the field of information fusion, followed by a review of literature pertaining to audio-visual person identification & verification. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on audio and visual information, are evaluated in clean and noisy conditions on a common database using a text-independent setup. It is shown that in clean conditions all the non-adaptive approaches provide similar performance; in noisy conditions they exhibit deterioration in their performance. It is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision surface is fixed but constructed to take into account the effects of noisy conditions, providing a good trade-off between performance in clean and noisy conditions

    Deep Residual-Dense Lattice Network for Speech Enhancement

    Full text link
    Convolutional neural networks (CNNs) with residual links (ResNets) and causal dilated convolutional units have been the network of choice for deep learning approaches to speech enhancement. While residual links improve gradient flow during training, feature diminution of shallow layer outputs can occur due to repetitive summations with deeper layer outputs. One strategy to improve feature re-usage is to fuse both ResNets and densely connected CNNs (DenseNets). DenseNets, however, over-allocate parameters for feature re-usage. Motivated by this, we propose the residual-dense lattice network (RDL-Net), which is a new CNN for speech enhancement that employs both residual and dense aggregations without over-allocating parameters for feature re-usage. This is managed through the topology of the RDL blocks, which limit the number of outputs used for dense aggregations. Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. RDL-Nets also use substantially fewer parameters and have a lower computational requirement. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement.Comment: 8 pages, Accepted by AAAI-202
    • …
    corecore