1,484 research outputs found

    Speech Synthesis Based on Hidden Markov Models

    Get PDF

    Spectral refinement with adaptive window-size selection for voicing detection and fundamental frequency estimation

    Get PDF
    Spectral refinement (SR) offers a computationally in-expensive means of generating a refined (higher resolution) signal spectrum by linearly combining the spectra of shorter, contiguous signal segments. The benefit of this method has previously been demonstrated on the problem of fundamental frequency (F0) estimation in speech processing – specifically for the improved estimation of very low F0. One drawback of SR is, however, the poorer detection of voicing onsets due to the Heisenberg-Gabor limit on time and frequency resolution. This may also lead to degraded performance in noisy conditions. Transitioning between long- and short-time windows for the spectral analysis may offer a good trade-off in these situations. This contribution presents a method to adaptively switch between short- and long-time windows (and, correspondingly, between the short-term and the refined spectrum) for voicing detection and F0 estimation. The improvements in voicing detection and F0 estimation due to this adaptive switching is conclusively demonstrated on audio signals in clean and corrupted conditions

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Structure Learning in Audio

    Get PDF

    Automatic prosodic analysis for computer aided pronunciation teaching

    Get PDF
    Correct pronunciation of spoken language requires the appropriate modulation of acoustic characteristics of speech to convey linguistic information at a suprasegmental level. Such prosodic modulation is a key aspect of spoken language and is an important component of foreign language learning, for purposes of both comprehension and intelligibility. Computer aided pronunciation teaching involves automatic analysis of the speech of a non-native talker in order to provide a diagnosis of the learner's performance in comparison with the speech of a native talker. This thesis describes research undertaken to automatically analyse the prosodic aspects of speech for computer aided pronunciation teaching. It is necessary to describe the suprasegmental composition of a learner's speech in order to characterise significant deviations from a native-like prosody, and to offer some kind of corrective diagnosis. Phonological theories of prosody aim to describe the suprasegmental composition of speech..

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others
    corecore