Search CORE

229 research outputs found

Exploiting Contextual Information for Prosodic Event Detection Using Auto-Context

Author: Johnson Michael T
Liu Jia
Xia Shanhong
Yang Hua
Zhang Wei-Qiang
Zhao Junhong
Publication venue: e-Publications@Marquette
Publication date: 01/12/2013
Field of study

Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we introduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of local acoustic features, after which the generated probabilities are used along with the local features as contextual information to train new classifiers. By iteratively using updated probabilities as the contextual information, the algorithm can accurately model contextual dependencies and improve classification ability. The advantages of this method include its flexible structure and the ability of capturing contextual relationships. When using the auto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and F-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic context. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%. The new algorithm outperforms conditional random fields, especially on boundary detection in terms of F-score. It also outperforms an n-gram language model on the task of pitch accent detection

epublications@Marquette

Springer - Publisher Connector

A Factored Language Model for Prosody Dependent Speech Recognition

Author: Jennifer S. Cole
Ken Chen
Mark A. Hasegawa-Johnson
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen

A Factored Language Model for Prosody Dependent Speech Recognition

Author: Jennifer S Cole
Ken Chen
Mark A Hasegawa-Johnson
Publication venue
Publication date: 23/04/2020
Field of study

CiteSeerX

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Author: Chen Jie
Kang Shiyin
Meng Helen
Song Changhe
Tuo Deyi
Wu Xixin
Wu Zhiyong
Publication venue
Publication date: 31/08/2023
Field of study

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target utterance, previous works on PSP mainly focus on utilizing intrautterance linguistic information of the current utterance only. This work proposes to use inter-utterance linguistic information to improve the performance of PSP. Multi-level contextual information, which includes both inter-utterance and intrautterance linguistic information, is extracted by a hierarchical encoder from character level, utterance level and discourse level of the input text. Then a multi-task learning (MTL) decoder predicts prosodic boundaries from multi-level contextual information. Objective evaluation results on two datasets show that our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH). It demonstrates the effectiveness of using multi-level contextual information for PSP. Subjective preference tests also indicate the naturalness of synthesized speeches are improved.Comment: Accepted by Interspeech202

arXiv.org e-Print Archive

Identifying prosodic prominence patterns for English text-to-speech synthesis

Author: Badino Leonardo
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word informativeness and salience, on prosodic prominence. To improve prosodic prominence prediction we first follow up the classic approach in which prosodic prominence patterns are flattened into binary sequences of pitch accented and pitch unaccented words. We propose and motivate statistic and syntactic dependency based features that are complementary to the most predictive features proposed in previous works on automatic pitch accent prediction and show their utility on both read and spontaneous speech. Different accentuation patterns can be associated to the same sentence. Such variability rises the question on how evaluating pitch accent predictors when more patterns are allowed. We carry out a study on prosodic symbols variability on a speech corpus where different speakers read the same text and propose an information-theoretic definition of optionality of symbolic prosodic events that leads to a novel evaluation metric in which prosodic variability is incorporated as a factor affecting prediction accuracy. We additionally propose a method to take advantage of the optionality of prosodic events in unit-selection speech synthesis. To better account for the tight links between the prosodic prominence of a word and the discourse/sentence context, part of this thesis goes beyond the accent/no-accent dichotomy and is devoted to a novel task, the automatic detection of contrast, where contrast is meant as a (Information Structure’s) relation that ties two words that explicitly contrast with each other. This task is mainly motivated by the fact that contrastive words tend to be prosodically marked with particularly prominent pitch accents. The identification of contrastive word pairs is achieved by combining lexical information, syntactic information (which mainly aims to identify the syntactic parallelism that often activates contrast) and semantic information (mainly drawn from the Word- Net semantic lexicon), within a Support Vector Machines classifier. Once we have identified patterns of prosodic prominence we propose methods to incorporate such information in TTS synthesis and test its impact on synthetic speech naturalness trough some large scale perceptual experiments. The results of these experiments cast some doubts on the utility of a simple accent/no-accent distinction in Hidden Markov Model based speech synthesis while highlight the importance of contrastive accents

Edinburgh Research Archive

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

Author: A Borthwick
A Ratnaparkhi
AL Berger
AW Black
B Picart
CJ Leggetter
Fahimeh Bahmaninezhad
H Kawahara
H Liang
H Zen
H Zen
H Zen
H Zen
H Zen
H Zen
Hossein Sameti
J Ghomeshi
J Nocedal
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
JJ Odell
K Hashimoto
K Hashimoto
K Oura
K Shinoda
K Tokuda
K Tokuda
K Tokuda
K Yu
K Yu
L Qin
M Bijankhan
M Gibson
MJ Gales
R Kubichek
S Sakai
S Takaki
S Takaki
Simon King
SJ Young
Soheil Khorram
T Drugman
T Drugman
T Koriyama
T Toda
T Toda
T Yoshimura
T Yoshimura
Thomas Drugman
V Rangarajan
VV Digalakis
Y Qian
YJ Wu
YJ Wu
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

Edinburgh Research Explorer

Reconocimiento de acto de diálogo secuencial para debates argumentativos árabes

Author: Belguith Lamia Hadrich
Dbabis Samira Ben
Ghorbel Hatem
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2018
Field of study

Dialogue act recognition remains a primordial task that helps user to automatically identify participants’ intentions. In this paper, we propose a sequential approach consisting of segmentation followed by annotation process to identify dialogue acts within Arabic politic debates. To perform DA recognition, we used the CARD corpus labeled using the SADA annotation schema. Segmentation and annotation tasks were then carried out using Conditional Random Fields probabilistic models as they prove high performance in segmenting and labeling sequential data. Learning results are notably important for the segmentation task (F-score=97.9%) and relatively reliable within the annotation process (f-score=63.4%) given the complexity of identifying argumentative tags and the presence of disfluencies in spoken conversations.El reconocimiento del acto de diálogo sigue siendo una tarea primordial que ayuda al usuario a identificar automáticamente las intenciones de los participantes. En este documento, proponemos un enfoque secuencial que consiste en la segmentación seguida de un proceso de anotación para identificar actos de diálogo dentro de los debates políticos árabes. Para realizar el reconocimiento DA, utilizamos el corpus CARD etiquetado utilizando el esquema de anotación SADA. Las tareas de segmentación y anotación se llevaron a cabo utilizando modelos probabilísticos de Campos aleatorios condicionales, ya que demuestran un alto rendimiento en la segmentación y el etiquetado de datos secuenciales. Los resultados de aprendizaje son especialmente importantes para la tarea de segmentación (F-score = 97.9%) y relativamente confiables dentro del proceso de anotación (f-score = 63.4%) dada la complejidad de identificar etiquetas argumentativas y la presencia de disfluencias en las conversaciones habladas

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)