9 research outputs found

    Acoustic-Phonetic Features for the Automatic Classification of Stop Consonants

    Get PDF
    In this paper, the acoustic–phonetic characteristics of American English stop consonants are investigated. Features studied in the literature are evaluated for their information content and new features are proposed. A statistically guided, knowledge-based, acoustic–phonetic system for the automatic classification of stops, in speaker independent continuous speech, is proposed. The system uses a new auditory-based front-end processing and incorporates new algorithms for the extraction and manipulation of the acoustic–phonetic features that proved to be rich in their information content. Recognition experiments are performed using hard decision algorithms on stops extracted from the TIMIT database continuous speech of 60 speakers (not used in the design process) from seven different dialects of American English. An accuracy of 96% is obtained for voicing detection, 90% for place articulation detection and 86% for the overall classification of stops

    Robust Auditory-Based Speech Processing Using the Average Localized Synchrony Detection

    Get PDF
    In this paper, a new auditory-based speech processing system based on the biologically rooted property of the average localized synchrony detection (ALSD) is proposed. The system detects periodicity in the speech signal at Bark-scaled frequencies while reducing the response’s spurious peaks and sensitivity to implementation mismatches, and hence presents a consistent and robust representation of the formants. The system is evaluated for its formant extraction ability while reducing spurious peaks. It is compared with other auditory-based and traditional systems in the tasks of vowel and consonant recognition on clean speech from the TIMIT database and in the presence of noise. The results illustrate the advantage of the ALSD system in extracting the formants and reducing the spurious peaks. They also indicate the superiority of the synchrony measures over the mean-rate in the presence of noise

    Acoustic-phonetic features for the automatic classification of stop consonants

    Full text link

    Digital Microphone Array - Design, Implementation and Speech Recognition Experiments

    Get PDF
    The instrumented meeting room of the future will help meetings to be more efficient and productive. One of the basic components of the instrumented meeting room is the speech recording device, in most cases a microphone array. The two basic requirements for this microphone array are portability and cost-efficiency, neither of which are provided by current commercially available arrays. This will change in the near future thanks to the availability of new digital MEMS microphones. This dissertation reports on the first successful implementation of a digital MEMS microphone array. This digital MEMS microphone array was designed, implemented, tested and evaluated and successfully compared with an existing analogue microphone array using a state-of-the-art ASR system and adaptation algorithms. The newly built digital MEMS microphone array compares well with the analogue microphone array on the basis of the word error rate achieved in an automated speech recognition system and is highly portable and economical

    Automatic translation of formal data specifications to voice data-input applications.

    Get PDF
    This thesis introduces a complete solution for automatic translation of formal data specifications to voice data-input applications. The objective of the research is to automatically generate applications for inputting data through speech from specifications of the structure of the data. The formal data specifications are XML DTDs. A new formalization called Grammar-DTD (G-DTD) is introduced as an extended DTD that contains grammars to describe valid values of the DTD elements and attributes. G-DTDs facilitate the automatic generation of Voice XML applications that correspond to the original DTD structure. The development of the automatic application-generator included identifying constraints on the G-DTD to ensure a feasible translation, using predicate calculus to build a knowledge base of inference rules that describes the mapping procedure, and writing an algorithm for the automatic translation based on the inference rules.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2006 .H355. Source: Masters Abstracts International, Volume: 45-01, page: 0354. Thesis (M.Sc.)--University of Windsor (Canada), 2006

    A comparison of features for large population speaker identification

    Get PDF
    Bibliography: leaves 95-104.Speech recognition systems all have one criterion in common; they perform better in a controlled environment using clean speech. Though performance can be excellent, even exceeding human capabilities for clean speech, systems fail when presented with speech data from more realistic environments such as telephone channels. The differences using a recognizer in clean and noisy environments are extreme, and this causes one of the major obstacles in producing commercial recognition systems to be used in normal environments. It is the lack of performance of speaker recognition systems with telephone channels that this work addresses. The human auditory system is a speech recognizer with excellent performance, especially in noisy environments. Since humans perform well at ignoring noise more than any machine, auditory-based methods are the promising approaches since they attempt to model the working of the human auditory system. These methods have been shown to outperform more conventional signal processing schemes for speech recognition, speech coding, word-recognition and phone classification tasks. Since speaker identification has received lot of attention in speech processing because of its waiting real-world applications, it is attractive to evaluate the performance using auditory models as features. Firstly, this study rums at improving the results for speaker identification. The improvements were made through the use of parameterized feature-sets together with the application of cepstral mean removal for channel equalization. The study is further extended to compare an auditory-based model, the Ensemble Interval Histogram, with mel-scale features, which was shown to perform almost error-free in clean speech. The previous studies of Elli to be more robust to noise were conducted on speaker dependent, small population, isolated words and now are extended to speaker independent, larger population, continuous speech. This study investigates whether the Elli representation is more resistant to telephone noise than mel-cepstrum as was shown in the previous studies, when now for the first time, it is applied for speaker identification task using the state-of-the-art Gaussian mixture model system

    Multiple Approaches to Robust Speech Recognition

    No full text
    2. ACOUSTICAL PRE-PROCESSING This paper compares several different approaches to robust speech We have found that two major factors degrading the performance of recognition. We review CMU’s ongoing research in the use of speech recognition systems using desktop microphones in normal acoustical pre-processing to achieve robust speech recognition, in- office environments are additive noise and unknown linear filtering. cluding the first evaluation of pre-processing in the context of the We showed in [2, 6] that simultaneous joint compensation for the DARPA standard ATIS domain for spoken language systems. We effects of additive noise and linear filtering is needed to achieve also describe and compare the effectiveness of three complementary maximal robustness with respect to acoustical differences between the methods of signal processing for robust speech recognition: acoustical training and testing environments of a speech recognition system. pre-processing, microphone array processing, and the use of We described in [2, 6] two algorithms that perform such joint comphysiologically-motivated models of peripheral signal processing. pensation, based on additive corrections to the cepstral coefficients of Recognition error rates are presented using these three approaches in the speech waveform. The more effective and adaptive of these isolation and in combination with each other for the speakeralgorithms, Codeword-Dependent Cepstral Normalization (CDCN) independent continuous alphanumeric census speech recognition task. [2], uses EM techniques to compute ML estimates of the additive noise and linear filtering that corrupt "clean " speech signals. The CDCN algorithm adapts automatically to new testing environments, 1
    corecore