5,484 research outputs found

    An HMM-based speech recognition IC.

    Get PDF
    Han Wei.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 60-61).Abstracts in English and Chinese.Abstract --- p.i摘要 --- p.iiAcknowledgements --- p.iiiContents --- p.ivList of Figures --- p.viList of Tables --- p.viiChapter Chapter 1 --- Introduction --- p.1Chapter 1.1. --- Speech Recognition --- p.1Chapter 1.2. --- ASIC Design with HDLs --- p.3Chapter Chapter 2 --- Theory of HMM-Based Speech Recognition --- p.6Chapter 2.1. --- Speaker-Dependent and Speaker-Independent --- p.6Chapter 2.2. --- Frame and Feature Vector --- p.6Chapter 2.3. --- Hidden Markov Model --- p.7Chapter 2.3.1. --- Markov Model --- p.8Chapter 2.3.2. --- Hidden Markov Model --- p.9Chapter 2.3.3. --- Elements of an HMM --- p.10Chapter 2.3.4. --- Types of HMMs --- p.11Chapter 2.3.5. --- Continuous Observation Densities in HMMs --- p.13Chapter 2.3.6. --- Three Basic Problems for HMMs --- p.15Chapter 2.4. --- Probability Evaluation --- p.16Chapter 2.4.1. --- The Viterbi Algorithm --- p.17Chapter 2.4.2. --- Alternative Viterbi Implementation --- p.19Chapter Chapter 3 --- HMM-based Isolated Word Recognizer Design Methodology …… --- p.20Chapter 3.1. --- Speech Recognition Based On Single Mixture --- p.23Chapter 3.2. --- Speech Recognition Based On Double Mixtures --- p.25Chapter Chapter 4 --- VLSI Implementation of the Speech Recognizer --- p.29Chapter 4.1. --- The System Requirements --- p.29Chapter 4.2. --- Implementation of a Speech Recognizer with a Single-Mixture HMM --- p.30Chapter 4.3. --- Implementation of a Speech Recognizer with a Double-Mixture HMM --- p.39Chapter 4.4. --- Extend Usage in High Order Mixtures HMM --- p.46Chapter 4.5. --- Pipelining and the System Timing --- p.50Chapter Chapter 5 --- Simulation and IC Testing --- p.53Chapter 5.1. --- Simulation Result --- p.53Chapter 5.2. --- Testing --- p.55Chapter Chapter 6 --- Discussion and Conclusion --- p.58Reference --- p.60Appendix I Verilog Code of the Double-Mixture HMM Based Speech Recognition IC (RTL Level) --- p.62Subtracter --- p.62Multiplier --- p.63Core_Adder --- p.65Register for X --- p.66Subtractor and Comparator --- p.67Shifter --- p.68Look-Up Table --- p.71Register for Constants --- p.79Register for Scores --- p.80Final Score Register --- p.84Controller --- p.86Top --- p.97Appendix II Chip Microphotograph --- p.103Appendix III Pin Assignment of the Speech Recognition IC --- p.104Appendix IV The Testing Board of the IC --- p.10

    Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification

    Full text link
    Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead. Our experiments on the RSR2015 Part-3 digit-prompted task show that, the DNN based alignment performs on par with the HMM alignment. The results also demonstrate the effectiveness of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech with incorrect pass-phrases.Comment: accepted by APSIPA ASC 201

    Automatic speech recognition with deep neural networks for impaired speech

    Get PDF
    The final publication is available at https://link.springer.com/chapter/10.1007%2F978-3-319-49169-1_10Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for subjects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.Peer ReviewedPostprint (author's final draft

    Towards Personalized Synthesized Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction

    Get PDF
    When individuals lose the ability to produce their own speech, due to degenerative diseases such as motor neurone disease (MND) or Parkinson’s, they lose not only a functional means of communication but also a display of their individual and group identity. In order to build personalized synthetic voices, attempts have been made to capture the voice before it is lost, using a process known as voice banking. But, for some patients, the speech deterioration frequently coincides or quickly follows diagnosis. Using HMM-based speech synthesis, it is now possible to build personalized synthetic voices with minimal data recordings and even disordered speech. The power of this approach is that it is possible to use the patient’s recordings to adapt existing voice models pre-trained on many speakers. When the speech has begun to deteriorate, the adapted voice model can be further modified in order to compensate for the disordered characteristics found in the patient’s speech. The University of Edinburgh has initiated a project for voice banking and reconstruction based on this speech synthesis technology. At the current stage of the project, more than fifteen patients with MND have already been recorded and five of them have been delivered a reconstructed voice. In this paper, we present an overview of the project as well as subjective assessments of the reconstructed voices and feedback from patients and their families
    corecore