Search CORE

215 research outputs found

Intonation in a text-to-speech conversion system

Author: Monaghan Alexander Ian Campbell
Publication venue: The University of Edinburgh
Publication date: 01/01/1991
Field of study

A Timing Model for Fast French

Author: Keller Eric
Zellner Brigitte
Publication venue: Department of Language and Linguistic Science, University of York (UK)
Publication date: 01/01/1996
Field of study

Models of speech timing are of both fundamental and applied interest. At the fundamental level, the prediction of time periods occupied by syllables and segments is required for general models of speech prosody and segmental structure. At the applied level, complete models of timing are an essential component of any speech synthesis system. Previous research has established that a large number of factors influence various levels of speech timing. Statistical analysis and modelling can identify order of importance and mutual influences between such factors. In the present study, a three-tiered model was created by a modified step-wise statistical procedure. It predicts the temporal structure of French, as produced by a single, highly fluent speaker at a fast speech rate (100 phonologically balanced sentences, hand-scored in the acoustic signal). The first tier models segmental influences due to phoneme type and contextual interactions between phoneme types. The second tier models syllable-level influences of lexical vs. grammatical status of the containing word, presence of schwa and the position within the word. The third tier models utterance-final lengthening. The complete segmental-syllabic model correlated with the original corpus of 1204 syllables at an overall r = 0.846. Residuals were normally distributed. An examination of subsets of the data set revealed some variation in the closeness of fit of the model. The results are considered to be useful for an initial timing model, particularly in a speech synthesis context. However, further research is required to extend the model to other speech rates and to examine inter-speaker variability in greater detail

CiteSeerX

Serveur académique lausannois

CogPrints Cognitive Sciences Eprint Archive

Voice Building from Insufficient Data - Classroom Experience with web-based Development Tools

Author: Black Alan W.
Kominek John
Schultz Tanja
Publication venue
Publication date: 30/06/2008
Field of study

KITopen

Quality evaluation of synthesized speech

Author: Bezooijen R.L. van
Heuven V.J. van
Publication venue
Publication date: 01/01/1995
Field of study

Fonetische correlaten en communicatieve functies van linguïstische structuu

Leiden University Scholary Publications

Radboud Repository

On the Usability of Spoken Dialogue Systems

Author: Larsen Lars Bo
Publication venue: Aalborg University, Department of Communication Technology
Publication date: 01/01/2003
Field of study

VBN

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

Author: Tits Noé
Publication venue
Publication date: 17/10/2023
Field of study

In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification (in text and phonetic domains). The system was built with open-source components and resources. Through an ablation study, we demonstrate the efficacy of our approach in automatically syllabifying words from several languages (English, French and Spanish). Additionally, we apply the technique to the transcriptions of the CMU ARCTIC dataset, generating valuable annotations available online\footnote{\url{https://github.com/noetits/MUST_P-SRL}} that are ideal for speech representation learning, speech unit discovery, and disentanglement of speech factors in several speech-related fields.Comment: Accepted for publication at EMNLP 202

arXiv.org e-Print Archive

Fast Speech in Unit Selection Speech Synthesis

Author: Moers-Prinz Donata
Publication venue: Universität Bielefeld
Publication date: 01/01/2020
Field of study

Moers-Prinz D. Fast Speech in Unit Selection Speech Synthesis. Bielefeld: Universität Bielefeld; 2020.Speech synthesis is part of the everyday life of many people with severe visual disabilities. For those who are reliant on assistive speech technology the possibility to choose a fast speaking rate is reported to be essential. But also expressive speech synthesis and other spoken language interfaces may require an integration of fast speech. Architectures like formant or diphone synthesis are able to produce synthetic speech at fast speech rates, but the generated speech does not sound very natural. Unit selection synthesis systems, however, are capable of delivering more natural output. Nevertheless, fast speech has not been adequately implemented into such systems to date. Thus, the goal of the work presented here was to determine an optimal strategy for modeling fast speech in unit selection speech synthesis to provide potential users with a more natural sounding alternative for fast speech output

Publications at Bielefeld University

Observations on the dynamic control of an articulatory synthesizer using speech production data

Author: Steiner Ingmar Michael Augustus
Publication venue: Fakultät 4 - Philosophische Fakultät II. Fachrichtung 4.7 - Allgemeine Linguistik
Publication date: 01/01/2010
Field of study

This dissertation explores the automatic generation of gestural score based control structures for a three-dimensional articulatory speech synthesizer. The gestural scores are optimized in an articulatory resynthesis paradigm using a dynamic programming algorithm and a cost function which measures the deviation from a gold standard in the form of natural speech production data. This data had been recorded using electromagnetic articulography, from the same speaker to which the synthesizer\u27s vocal tract model had previously been adapted. Future work to create an English voice for the synthesizer and integrate it into a text-to-speech platform is outlined.Die vorliegende Dissertation untersucht die automatische Erzeugung von gesturalpartiturbasierten Steuerdaten für ein dreidimensionales artikulatorisches Sprachsynthesesystem. Die gesturalen Partituren werden in einem artikulatorischen Resynthese-Paradigma mittels dynamischer Programmierung optimiert, unter Zuhilfenahme einer Kostenfunktion, die den Abstand zu einem "Gold Standard" in Form natürlicher Sprachproduktionsdaten mißt. Diese Daten waren mit elektromagnetischer Artikulographie am selben Sprecher aufgenommen worden, an den zuvor das Vokaltraktmodell des Synthesesystems angepaßt worden war. Weiterführende Forschung, eine englische Stimme für das Synthesesystem zu erzeugen und sie in eine Text-to-Speech-Plattform einzubetten, wird umrissen

Further Investigation of MDS as a Tool for Evaluation of Speech Quality of Synthesized Speech

Author: Janska Anna C.
Publication venue
Publication date: 26/11/2009
Field of study

The dissertation investigates MDS as a tool for the evaluation of the quality of synthesized speech. More specifically, it investigates the relations between Weighted Euclidean Distance Scaling and Simple Euclidean Distance Scaling, and how aggregating data affects the MDS configuration. It is investigated to what extent a subset of experimental participants and/or experimental stimuli are representative of a larger test set. For that purpose an experiment was conducted on the basis of a subset of stimuli used in the Blizzard Challenge 2008. Issues in the evaluation of Speech Synthesis are discussed and an overview of the basics of multi-dimensional scaling is given to an extent that allows comprehension of methods used in the application of Multi-dimensional scaling to speech synthesis evaluation. Based on the experimental findings, further experiments are suggested with the goal in mind that testing procedures can be optimized to such an extent that the number of experimental participants can be drastically reduced

Edinburgh Research Archive