287 research outputs found

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Implementation and Evaluation of Acoustic Distance Measures for Syllables

    Get PDF
    Munier C. Implementation and Evaluation of Acoustic Distance Measures for Syllables. Bielefeld (Germany): Bielefeld University; 2011.In dieser Arbeit werden verschiedene akustische Ähnlichkeitsmaße fĂŒr Silben motiviert und anschließend evaluiert. Der Mahalanobisabstand als lokales Abstandsmaß fĂŒr einen Dynamic-Time-Warping-Ansatz zum Messen von akustischen AbstĂ€nden hat die FĂ€higkeit, Silben zu unterscheiden. Als solcher erlaubt er die Klassifizierung von Silben mit einer Genauigkeit, die fĂŒr die Klassifizierung von kleinen akustischen Einheiten ĂŒblich ist (60 Prozent fĂŒr eine NĂ€chster-Nachbar-Klassifizierung auf einem Satz von zehn Silben fĂŒr Samples eines einzelnen Sprechers). Dieses Maß kann durch verschiedene Techniken verbessert werden, die jedoch seine AusfĂŒhrungsgeschwindigkeit verschlechtern (Benutzen von mehr Mischverteilungskomponenten fĂŒr die SchĂ€tzung von Kovarianzen auf einer Gaußschen Mischverteilung, Benutzen von voll besetzten Kovarianzmatrizen anstelle von diagonalen Kovarianzmatrizen). Durch experimentelle Evaluierung wird deutlich, dass ein gut funktionierender Algorithmus zur Silbensegmentierung, welcher eine akkurate SchĂ€tzung von Silbengrenzen erlaubt, fĂŒr die korrekte Berechnung von akustischen AbstĂ€nden durch die in dieser Arbeit entwickelten Ähnlichkeitsmaße unabdingbar ist. Weitere AnsĂ€tze fĂŒr Ähnlichkeitsmaße, die durch ihre Anwendung in der Timbre-Klassifizierung von MusikstĂŒcken motiviert sind, zeigen keine adĂ€quate FĂ€higkeit zur Silbenunterscheidung.In this work, several acoustic similarity measures for syllables are motivated and successively evaluated. The Mahalanobis distance as local distance measure for a dynamic time warping approach to measure acoustic distances is a measure that is able to discriminate syllables and thus allows for syllable classification with an accuracy that is common to the classification of small acoustic units (60 percent for a nearest neighbor classification of a set of ten syllables using samples of a single speaker). This measure can be improved using several techniques that however impair the execution speed of the distance measure (usage of more mixture density components for the estimation of covariances from a Gaussian mixture model, usage of fully occupied covariance matrices instead of diagonal covariance matrices). Through experimental evaluation it becomes evident that a decently working syllable segmentation algorithm allowing for accurate syllable border estimations is essential to the correct computation of acoustic distances by the similarity measures developed in this work. Further approaches for similarity measures which are motivated by their usage in timbre classification of music pieces do not show adequate syllable discrimination abilities

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    On the automatic segmentation of transcribed words

    Get PDF

    Speech-driven animation using multi-modal hidden Markov models

    Get PDF
    The main objective of this thesis was the synthesis of speech synchronised motion, in particular head motion. The hypothesis that head motion can be estimated from the speech signal was confirmed. In order to achieve satisfactory results, a motion capture data base was recorded, a definition of head motion in terms of articulation was discovered, a continuous stream mapping procedure was developed, and finally the synthesis was evaluated. Based on previous research into non-verbal behaviour basic types of head motion were invented that could function as modelling units. The stream mapping method investigated in this thesis is based on Hidden Markov Models (HMMs), which employ modelling units to map between continuous signals. The objective evaluation of the modelling parameters confirmed that head motion types could be predicted from the speech signal with an accuracy above chance, close to 70%. Furthermore, a special type ofHMMcalled trajectoryHMMwas used because it enables synthesis of continuous output. However head motion is a stochastic process therefore the trajectory HMM was further extended to allow for non-deterministic output. Finally the resulting head motion synthesis was perceptually evaluated. The effects of the “uncanny valley” were also considered in the evaluation, confirming that rendering quality has an influence on our judgement of movement of virtual characters. In conclusion a general method for synthesising speech-synchronised behaviour was invented that can applied to a whole range of behaviours
    • 

    corecore