497 research outputs found

    Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech

    Get PDF
    Segment-based speech recognition has shown to be a competitive alternative to the state-of-the-art HMM-based techniques. Its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, it is important to recover possible missing segments generated by segment-based segmentation algorithm. An aspect of this research focuses on determining the missing segments due to missed detection of segment boundaries. The acoustic discontinuities, together with manner-distinctive features are utilized to recover the missing segments. Another aspect of improvement to our segment-based framework tackles the restriction of having limited amount of training speech data which prevents the usage of more complex covariance matrices for the acoustic models. Feature dimensional reduction in the form of the Principal Component Analysis (PCA) is applied to enable the training of full covariance matrices and it results in improved segment-based phoneme recognition. Furthermore, to benefit from the fact that segment-based approach allows the integration of phonetic knowledge, we incorporate the probability of each segment being one type of sound unit of a certain specific common manner of articulation into the scoring of the segment graphs. Our experiment shows that, with the proposed improvements, our segment-based framework approximately increases the phoneme recognition accuracy by approximately 25% of the one obtained from the baseline segment-based speech recognition

    Contributions of cochlea-scaled entropy and consonant-vowel boundaries to prediction of speech intelligibility in noise

    Get PDF
    published_or_final_versio

    Perception of allophonic cues to English word boundaries by Polish learners: Approximant devoicing in English

    Get PDF
    The study investigates the perception of devoicing of English /w, r, j, l/ after /p, t, k/ as a word-boundary cue by Polish listeners. Polish does not devoice sonorants following voiceless stops in word-initial positions. As a result, Polish learners are not made sensitive to sonorant devoicing as a segmentation cue. Higher-proficiency and lower-proficiency Polish learners of English participated in the task in which they recognised phrases such as buy train vs. bite rain or pie plot vs. pipe lot. The analysis of accuracy scores revealed that successful segmentation was only above chance level, indicating that sonorant voicing/devoicing cue was largely unattended to in identifying the boundary location. Moreover, higher proficiency did not lead to more successful segmentation. The analysis of reaction times showed an unclear pattern in which higher-proficiency listeners segmented the test phrases faster but not more accurately than lower-proficiency listeners. Finally, #CS sequences were recognised more accurately than C#S sequences, which was taken to suggest that the listeners may have had some limited knowledge that devoiced sonorants appear only in word-initial positions, but they treated voiced sonorants as equal candidates for word-final and word-initial position

    An Improved GA Based Modified Dynamic Neural Network for Cantonese-Digit Speech Recognition

    Get PDF
    Author name used in this publication: F. H. F. Leung2007-2008 > Academic research: refereed > Chapter in an edited book (author)published_fina

    Automatic prosodic analysis for computer aided pronunciation teaching

    Get PDF
    Correct pronunciation of spoken language requires the appropriate modulation of acoustic characteristics of speech to convey linguistic information at a suprasegmental level. Such prosodic modulation is a key aspect of spoken language and is an important component of foreign language learning, for purposes of both comprehension and intelligibility. Computer aided pronunciation teaching involves automatic analysis of the speech of a non-native talker in order to provide a diagnosis of the learner's performance in comparison with the speech of a native talker. This thesis describes research undertaken to automatically analyse the prosodic aspects of speech for computer aided pronunciation teaching. It is necessary to describe the suprasegmental composition of a learner's speech in order to characterise significant deviations from a native-like prosody, and to offer some kind of corrective diagnosis. Phonological theories of prosody aim to describe the suprasegmental composition of speech..

    Acoustic-phonetic constraints in continuous speech recognition: a case study using the digit vocabulary.

    Get PDF
    Thesis (Ph.D.)—Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1985.Includes bibliographical references (leaves 155-159).This electronic version was scanned from a copy of the thesis on file at the Speech Communication Group. The certified thesis is available in the Institute Archives and Special Collections.Vinton-Hayes Fellowship. DARPA, monitored through the Office of Naval Research. System Development Foundation.Ph.D

    Acoustic-Phonetic Features for the Automatic Classification of Stop Consonants

    Get PDF
    In this paper, the acoustic–phonetic characteristics of American English stop consonants are investigated. Features studied in the literature are evaluated for their information content and new features are proposed. A statistically guided, knowledge-based, acoustic–phonetic system for the automatic classification of stops, in speaker independent continuous speech, is proposed. The system uses a new auditory-based front-end processing and incorporates new algorithms for the extraction and manipulation of the acoustic–phonetic features that proved to be rich in their information content. Recognition experiments are performed using hard decision algorithms on stops extracted from the TIMIT database continuous speech of 60 speakers (not used in the design process) from seven different dialects of American English. An accuracy of 96% is obtained for voicing detection, 90% for place articulation detection and 86% for the overall classification of stops

    Investigating potential acoustic correlates of sonority: Intensity vs. periodic energy

    Get PDF
    This empirical study examines possible acoustic correlates of sonority. The results indicate that periodic energy (in particular its sum) is a more reliable cue to sonority than intensity
    corecore