CORE
🇺🇦
make metadata, not war
Services
Research
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
Tone-Group F0 selection for modeling focus prominence in small-footprint speech synthesis
Authors
G. Xydas Kouroupetroglou, G.
Publication date
1 January 2006
Publisher
Abstract
This work targets to improve the naturalness of synthetic intonational contours in Text-to-Speech synthesis through the provision of prominence, which is a major expression of human speech. Focusing on the tonal dimension of emphasis, we present a robust unit-selection methodology for generating realistic F0 curves in cases where focus prominence is required. The proposed approach is based on selecting Tone-Group units from commonly used prosodic corpora that are automatically transcribed as patterns of syllables. In contrast to related approaches, patterns represent only the most perceivable sections of the sampled curves and are encoded to serve morphologically different sequence of syllables. This results in a minimization of the required amount of units so as to achieve sufficient coverage within the database. Nevertheless, this optimization enables the application of high-quality F0 generation to small-footprint text-to-speech synthesis. For generic F0 selection we query the database based on sequences of ToBI labels, though other intonational frameworks can be used as well. To realize focus prominence on specific Tone-Groups the selection also incorporates a level indicator of emphasis. We set up a series of listening tests by exploiting a database built from a 482-utterance corpus, which featured partially purpose-uttered emphasis. The results showed a clear subjective preference of the proposed model against a linear regression one in 75% of the cases when used in generic synthesis. Furthermore, this model provided ambiguous percept of emphasis in an experiment featuring major and minor degrees of prominence. © 2006 Elsevier B.V. All rights reserved
Similar works
Full text
Available Versions
Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:lib.uoa.gr:uoadl:2995169
Last time updated on 10/02/2023