Search CORE

147 research outputs found

A phonetic concatenative approach of labial coarticulation

Author: Bonneau Anne
Laprie Yves
Robert Vincent
Publication venue: HAL CCSD
Publication date: 27/08/2007
Field of study

International audiencePredicting the effects of labial coarticulation is an important aspect with a view to developing an artificial talking head. This paper describes a concatenation approach that uses sigmoids to represent the evolution of labial parameters. Labial parameters considered are lip aperture, protrusion, stretching and jaw aperture. A first formal algorithm determines the relevant transitions, i.e. those corresponding to phonemes imposing constraints on one of the labial parameters. Then relevant transitions are either retrieved or interpolated from a set of reference sigmoids which have been trained on a speaker specific corpus. This labial corpus is made up of isolated vowels, CV, VCV, VCCV and 100 sentences. A final stage consists in improving the overall syntagmatic consistency of the concatenation

INRIA a CCSD electronic archive server

Comparison between two predicting methods of labial coarticulation

Author: Feldmar Jacques
Laprie Yves
Robert Vincent
Publication venue: HAL CCSD
Publication date: 08/12/2008
Field of study

International audienceThe construction of a highly intelligible talking head involving relevant lip gestures is especially important for hearing impaired people. This requires realistic rendering of lip and jaw movements and thus relevant modeling of lip coarticulation. This paper presents the comparison between the Cohen & Massaro prediction algorithm and our concatenation plus completion strategy guided by phonetic knowledge. Although results show that Cohen & Massaro perform slightly better, the concatenation and completion strategy approximates consonant clusters markedly better particularly for the protrusion parameter. These results also show the concatenation and completion strategy could be easily improved via the recording of better reference models for isolated vowels

INRIA a CCSD electronic archive server

Phoneme Recognition Using Acoustic Events

Author: Carson-Berndsen Julie
Huebener Kai
Publication venue
Publication date: 01/01/1994
Field of study

This paper presents a new approach to phoneme recognition using nonsequential sub--phoneme units. These units are called acoustic events and are phonologically meaningful as well as recognizable from speech signals. Acoustic events form a phonologically incomplete representation as compared to distinctive features. This problem may partly be overcome by incorporating phonological constraints. Currently, 24 binary events describing manner and place of articulation, vowel quality and voicing are used to recognize all German phonemes. Phoneme recognition in this paradigm consists of two steps: After the acoustic events have been determined from the speech signal, a phonological parser is used to generate syllable and phoneme hypotheses from the event lattice. Results obtained on a speaker--dependent corpus are presented.Comment: 4 pages, to appear at ICSLP'94, PostScript version (compressed and uuencoded

arXiv.org e-Print Archive

CiteSeerX

Universaar

Acronym

HMM-based Automatic Visual Speech Segmentation Using Facial Data

Author: Berger Marie-Odile
Colotte Vincent
Musti Utpala
Ouni Slim
Toutios Asterios
Wrobel-Dautcourt Brigitte
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceWe describe automatic visual speech segmentation using facial data captured by a stereo-vision technique. The segmentation is performed using an HMM-based forced alignment mechanism widely used in automatic speech recognition. The idea is based on the assumption that using visual speech data alone for the training might capture the uniqueness in the facial compo- nent of speech articulation, asynchrony (time lags) in visual and acoustic speech segments and significant coarticulation effects. This should provide valuable information that helps to show the extent to which a phoneme may affect surrounding phonemes visually. This should provide information valuable in labeling the visual speech segments based on dominant coarticulatory contexts

CiteSeerX

INRIA a CCSD electronic archive server

Challenges in analysis and processing of spontaneous speech

Author
Publication venue: MTA Nyelvtudományi Intézet
Publication date: 01/01/2018
Field of study

Selected and peer-reviewed papers of the workshop entitled Challenges in Analysis and Processing of Spontaneous Speech (Budapest, 2017

Repository of the Academy's Library

Segmental and prosodic improvements to speech generation

Author: Klabbers E.A.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2000
Field of study

Repository TU/e

Pure OAI Repository

Fast Speech in Unit Selection Speech Synthesis

Author: Moers-Prinz Donata
Publication venue: Universität Bielefeld
Publication date: 01/01/2020
Field of study

Moers-Prinz D. Fast Speech in Unit Selection Speech Synthesis. Bielefeld: Universität Bielefeld; 2020.Speech synthesis is part of the everyday life of many people with severe visual disabilities. For those who are reliant on assistive speech technology the possibility to choose a fast speaking rate is reported to be essential. But also expressive speech synthesis and other spoken language interfaces may require an integration of fast speech. Architectures like formant or diphone synthesis are able to produce synthetic speech at fast speech rates, but the generated speech does not sound very natural. Unit selection synthesis systems, however, are capable of delivering more natural output. Nevertheless, fast speech has not been adequately implemented into such systems to date. Thus, the goal of the work presented here was to determine an optimal strategy for modeling fast speech in unit selection speech synthesis to provide potential users with a more natural sounding alternative for fast speech output

Publications at Bielefeld University

Domain-optimized Chinese speech generation.

Author
Publication venue
Publication date: 01/01/2001
Field of study

Fung Tien Ying.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 119-128).Abstracts in English and Chinese.Abstract --- p.1Acknowledgement --- p.1List of Figures --- p.7List of Tables --- p.11Chapter 1 --- Introduction --- p.14Chapter 1.1 --- General Trends on Speech Generation --- p.15Chapter 1.2 --- Domain-Optimized Speech Generation in Chinese --- p.16Chapter 1.3 --- Thesis Organization --- p.17Chapter 2 --- Background --- p.19Chapter 2.1 --- Linguistic and Phonological Properties of Chinese --- p.19Chapter 2.1.1 --- Articulation --- p.20Chapter 2.1.2 --- Tones --- p.21Chapter 2.2 --- Previous Development in Speech Generation --- p.22Chapter 2.2.1 --- Articulatory Synthesis --- p.23Chapter 2.2.2 --- Formant Synthesis --- p.24Chapter 2.2.3 --- Concatenative Synthesis --- p.25Chapter 2.2.4 --- Existing Systems --- p.31Chapter 2.3 --- Our Speech Generation Approach --- p.35Chapter 3 --- Corpus-based Syllable Concatenation: A Feasibility Test --- p.37Chapter 3.1 --- Capturing Syllable Coarticulation with Distinctive Features --- p.39Chapter 3.2 --- Creating a Domain-Optimized Wavebank --- p.41Chapter 3.2.1 --- Generate-and-Filter --- p.44Chapter 3.2.2 --- Waveform Segmentation --- p.47Chapter 3.3 --- The Use of Multi-Syllable Units --- p.49Chapter 3.4 --- Unit Selection for Concatenative Speech Output --- p.50Chapter 3.5 --- A Listening Test --- p.51Chapter 3.6 --- Chapter Summary --- p.52Chapter 4 --- Scalability and Portability to the Stocks Domain --- p.55Chapter 4.1 --- Complexity of the ISIS Responses --- p.56Chapter 4.2 --- XML for input semantic and grammar representation --- p.60Chapter 4.3 --- Tree-Based Filtering Algorithm --- p.63Chapter 4.4 --- Energy Normalization --- p.67Chapter 4.5 --- Chapter Summary --- p.69Chapter 5 --- Investigation in Tonal Contexts --- p.71Chapter 5.1 --- The Nature of Tones --- p.74Chapter 5.1.1 --- Human Perception of Tones --- p.75Chapter 5.2 --- Relative Importance of Left and Right Tonal Context --- p.77Chapter 5.2.1 --- Tonal Contexts in the Date-Time Subgrammar --- p.77Chapter 5.2.2 --- Tonal Contexts in the Numeric Subgrammar --- p.82Chapter 5.2.3 --- Conclusion regarding the Relative Importance of Left versus Right Tonal Contexts --- p.86Chapter 5.3 --- Selection Scheme for Tonal Variants --- p.86Chapter 5.3.1 --- Listening Test for our Tone Backoff Scheme --- p.90Chapter 5.3.2 --- Error Analysis --- p.92Chapter 5.4 --- Chapter Summary --- p.94Chapter 6 --- Summary and Future Work --- p.95Chapter 6.1 --- Contributions --- p.97Chapter 6.2 --- Future Directions --- p.98Chapter A --- Listening Test Questionnaire for FOREX Response Genera- tion --- p.100Chapter B --- Major Response Types For ISIS --- p.102Chapter C --- Recording Corpus for Tone Investigation in Date-time Sub- grammar --- p.105Chapter D --- Statistical Test for Left Tonal Context --- p.109Chapter E --- Statistical Test for Right Tonal Context --- p.112Chapter F --- Listening Test Questionnaire for Backoff Unit Selection Scheme --- p.115Chapter G --- Statistical Test for the Backoff Unit Selection Scheme --- p.117Chapter H --- Statistical Test for the Backoff Unit Selection Scheme --- p.118Bibliography --- p.11

CUHK Digital Repository

Recommended from our members

Cortical encoding and decoding models of speech production

Author: Chartier Josh
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

To speak is to dynamically orchestrate the movements of the articulators (jaw, tongue, lips, and larynx), which in turn generate speech sounds. It is an amazing mental and motor feat that is controlled by the brain and is fundamental for communication. Technology that could translate brain signals into speech would be transformative for people who are unable to communicate as a result of neurological impairments. This work first investigates how articulator movements that underlie natural speech production are represented in the brain. Building upon this, this work also presents a neural decoder that can synthesize audible speech from brain signals. Data to support these results were from direct cortical recordings of the human sensorimotor cortex while participants spoke natural sentences. Neural activity at individual electrodes encoded a diversity of articulatory kinematic trajectories (AKTs), each revealing coordinated articulator movements towards specific vocal tract shapes. The neural decoder was designed to leverage the kinematic trajectories encoded in the sensorimotor cortex which enhanced performance even with limited data. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication

eScholarship - University of California