5 research outputs found

    An investigation of likelihood normalization for robust ASR

    Get PDF
    International audienceNoise-robust automatic speech recognition (ASR) systems rely on feature and/or model compensation. Existing compensation techniques typically operate on the features or on the parameters of the acoustic models themselves. By contrast, a number of normalization techniques have been defined in the field of speaker verification that operate on the resulting log-likelihood scores. In this paper, we provide a theoretical motivation for likelihood normalization due to the so-called "hubness" phenomenon and we evaluate the benefit of several normalization techniques on ASR accuracy for the 2nd CHiME Challenge task. We show that symmetric normalization (S-norm) reduces the relative error rate by 43% alone and by 10% after feature and model compensation

    The Effect Of Acoustic Variability On Automatic Speaker Recognition Systems

    Get PDF
    This thesis examines the influence of acoustic variability on automatic speaker recognition systems (ASRs) with three aims. i. To measure ASR performance under 5 commonly encountered acoustic conditions; ii. To contribute towards ASR system development with the provision of new research data; iii. To assess ASR suitability for forensic speaker comparison (FSC) application and investigative/pre-forensic use. The thesis begins with a literature review and explanation of relevant technical terms. Five categories of research experiments then examine ASR performance, reflective of conditions influencing speech quantity (inhibitors) and speech quality (contaminants), acknowledging quality often influences quantity. Experiments pertain to: net speech duration, signal to noise ratio (SNR), reverberation, frequency bandwidth and transcoding (codecs). The ASR system is placed under scrutiny with examination of settings and optimum conditions (e.g. matched/unmatched test audio and speaker models). Output is examined in relation to baseline performance and metrics assist in informing if ASRs should be applied to suboptimal audio recordings. Results indicate that modern ASRs are relatively resilient to low and moderate levels of the acoustic contaminants and inhibitors examined, whilst remaining sensitive to higher levels. The thesis provides discussion on issues such as the complexity and fragility of the speech signal path, speaker variability, difficulty in measuring conditions and mitigation (thresholds and settings). The application of ASRs to casework is discussed with recommendations, acknowledging the different modes of operation (e.g. investigative usage) and current UK limitations regarding presenting ASR output as evidence in criminal trials. In summary, and in the context of acoustic variability, the thesis recommends that ASRs could be applied to pre-forensic cases, accepting extraneous issues endure which require governance such as validation of method (ASR standardisation) and population data selection. However, ASRs remain unsuitable for broad forensic application with many acoustic conditions causing irrecoverable speech data loss contributing to high error rates

    Towards Engineering Reliable Keystroke Biometrics Systems

    Get PDF
    In this thesis, we argue that most of the work in the literature on behavioural-based biometric systems using AI and machine learning is immature and unreliable. Our analysis and experimental results show that designing reliable behavioural-based biometric systems requires a systematic and complicated process. We first discuss the limitation in existing work and the use of conventional machine learning methods. We use the biometric zoos theory to demonstrate the challenge of designing reliable behavioural-based biometric systems. Then, we outline the common problems in engineering reliable biometric systems. In particular, we focus on the need for novelty detection machine learning models and adaptive machine learning algorithms. We provide a systematic approach to design and build reliable behavioural-based biometric systems. In our study, we apply the proposed approach to keystroke dynamics. Keystroke dynamics is behavioural-based biometric that identify individuals by measuring their unique typing behaviours on physical or soft keyboards. Our study shows that it is possible to design reliable behavioral-based biometrics and address the gaps in the literature

    Uncertainty propagation for noise robust speaker recognition: the case of NIST-SRE

    Get PDF
    International audienceUncertainty propagation is an established approach to handle noisy and reverberant conditions in automatic speech recognition (ASR), but it has little been studied for speaker recognition so far. Yu et al. recently proposed to propagate uncertainty to the Baum-Welch (BW) statistics without changing the posterior probability of each mixture component. They obtained good results on a small dataset (YOHO) but little improvement on the NIST-SRE dataset, despite the use of oracle uncertainty estimates. In this paper, we propose to modify the computation of the posterior probability of each mixture component in order to obtain unbiased BW statistics. We show that our approach improves the accuracy of BW statistics on the Wall Street Journal (WSJ) corpus, but yields little or no improvement on NIST-SRE again. We provide a theoretical explanation for this that opens the way for more efficient exploitation of uncertainty on NIST-SRE and other large datasets in the future

    Classifying Galaxy Images Using Improved Residual Networks

    Get PDF
    The field of astronomy has made tremendous progress in recent years thanks to advancements in technology and the development of sophisticated algorithms. One area of interest for astronomers is the classification of galaxy morphology, which involves categorizing galaxies based on their visual appearance. However, with the sheer number of galaxy images available, it would be a daunting task to manually classify them all. To address this challenge, a novel Residual Neural Network (ResNet) model, called ResNet_Var, that can automatically classify galaxy images is proposed in this study. Galaxy Zoo 2 dataset is used in this research, which contains over 28,000 images for the five-class classification task and over 25,000 images for the seven-class classification task. To evaluate the effectiveness of the ResNet_Var model, various metrics such as accuracy, precision, recall, and F1 score were calculated. The results were impressive, with the ResNet_Var model outperforming other popular networks such as VGG16, VGG19, Inception, and ResNet50. Specifically, the overall classification accuracy of the ResNet_Var model was 95.35% for the five-class classification task and 93.54% for the seven-class classification task. The potential applications of the ResNet_Var model are vast. With such a high accuracy rate, the ResNet_Var model is well-suited for large-scale galaxy classification in optical space surveys. By automating the classification process, astronomers can quickly and accurately categorize galaxy images according to their morphology. This, in turn, can help advance our understanding of galaxy formation and evolution, as well as provide valuable insights into the properties of dark matter and the nature of the universe
    corecore