9 research outputs found

    Hybrid Method for Digits Recognition using Fixed-Frame Scores and Derived Pitch

    Get PDF
    This paper presents a procedure of frame normalization based on the traditional dynamic time warping (DTW) using the LPC coefficients. The redefined method is called as the DTW frame-fixing method (DTW-FF), it works by normalizing the word frames of the input against the reference frames. The enthusiasm to this study is due to neural network limitation that entails a fix number of input nodes for when processing multiple inputs in parallel. Due to this problem, this research is initiated to reduce the amount of computation and complexity in a neural network by reducing the number of inputs into the network. In this study, dynamic warping process is used, in which local distance scores of the warping path are fixed and collected so that their scores are of equal number of frames. Also studied in this paper is the consideration of pitch as a contributing feature to the speech recognition. Results showed a good performance and improvement when using pitch along with DTW-FF feature. The convergence rate between using the steepest gradient descent is also compared to another method namely conjugate gradient method. Convergence rate is also improved when conjugate gradient method is introduced in the back-propagation algorithm

    An articulatory feature-based tandem approach and factored observation modeling

    Get PDF
    The so-called tandem approach, where the posteriors of a multilayer perceptron (MLP) classi�er are used as features in an automatic speech recognition (ASR) system has proven to be a very effective method. Most tandem approaches up to date have relied on MLPs trained for phone classi�cation, and appended the posterior features to some standard feature hidden Markov model (HMM). In this paper, we develop an alternative tandem approach based on MLPs trained for articulatory feature (AF) classi�cation. We also develop a factored observation model for characterizing the posterior and standard features at the HMM outputs, allowing for separate hidden mixture and state-tying structures for each factor. In experiments on a subset of Switchboard, we show that the AFbased tandem approach is as effective as the phone-based approach, and that the factored observation model signi�cantly outperforms the simple feature concatenation approach while using fewer parameters

    A comparison of acoustic and linguistics methodologies for Alzheimer’s dementia recognition

    Get PDF
    In the light of the current COVID-19 pandemic, the need for remote digital health assessment tools is greater than ever. This statement is especially pertinent for elderly and vulnerable populations. In this regard, the INTERSPEECH 2020 Alzheimer’s Dementia Recognition through Spontaneous Speech (ADReSS) Challenge offers competitors the opportunity to develop speech and language-based systems for the task of Alzheimer’s Dementia (AD) recognition. The challenge data consists of speech recordings and their transcripts, the work presented herein is an assessment of different contemporary approaches on these modalities. Specifically, we compared a hierarchical neural network with an attention mechanism trained on linguistic features with three acoustic-based systems: (i) Bag-of-Audio-Words (BoAW) quantising different low-level descriptors, (ii) a Siamese Network trained on log-Mel spectrograms, and (iii) a Convolutional Neural Network (CNN) end-to-end system trained on raw waveforms. Key results indicate the strength of the linguistic approach over the acoustics systems. Our strongest test-set result was achieved using a late fusion combination of BoAW, End-to-End CNN, and hierarchical-attention networks, which outperformed the challenge baseline in both the classification and regression tasks

    Entropy based classifier combination for sentence segmentation

    No full text
    We describe recent extensions to our previous work, where we explored the use of individual classifiers, namely, boosting and maximum entropy models for sentence segmentation. In this paper we extend the set of classification methods with support vector machine (SVM). We propose a new dynamic entropy-based classifier combination approach to combine these classifiers, and compare it with the traditional classifier combination techniques, namely, voting, linear regression and logistic regression. Furthermore, we also investigate the combination of hidden event language models with the output of the proposed classifier combination, and the output of individual classifiers. Experimental studies conducted on the Mandarin TDT4 broadcast news database shows that the SVM classifier as an individual classifier improves over our previous best system. However, the proposed entropy-based classifier combination approach shows the best improvement in F-Measure of 1 % absolute, and the voting approach shows the best reduction in NIST error rate of 2.7 % absolute when compared to the previous best system. Index Terms — sentence segmentation, classifier combination, entropy, lexical and prosodic features, hidden event language model 1

    On Joint Optimization of Automatic Speaker Verification and Anti-spoofing in the Embedding Space

    Get PDF
    Biometric systems are exposed to spoofing attacks which may compromise their security, and voice biometrics based on automatic speaker verification (ASV), is no exception. To increase the robustness against such attacks, anti-spoofing systems have been proposed for the detection of replay, synthesis and voice conversion-based attacks. However, most proposed anti- spoofing techniques are loosely integrated with the ASV system. In this work, we develop a new integration neural network which jointly processes the embeddings extracted from ASV and anti- spoofing systems in order to detect both zero-effort impostors and spoofing attacks. Moreover, we propose a new loss function based on the minimization of the area under the expected (AUE) performance and spoofability curve (EPSC), which allows us to optimize the integration neural network on the desired operating range in which the biometric system is expected to work. To evaluate our proposals, experiments were carried out on the recent ASVspoof 2019 corpus, including both logical access (LA) and physical access (PA) scenarios. The experimental results show that our proposal clearly outperforms some well-known techniques based on the integration at the score- and embedding- level. Specifically, our proposal achieves up to 23.62% and 22.03% relative equal error rate (EER) improvement over the best performing baseline in the LA and PA scenarios, respectively, as well as relative gains of 27.62% and 29.15% on the AUE metric.Spanish Ministry of Science and Innovation Project No. PID2019-104206GB- I00/AEI/10.13039/50110001103
    corecore