17 research outputs found
The Intonational Phonology of Daco-Romance
Within the field of Romance phonetics and phonology, the intonation of the Daco- Romance languages (Romanian, Aromanian, Megleno-Romanian and Istro-Romanian) has been a much-neglected topic. In fact, until relatively recently, little was known about the general importance of intonation in speech and about its forms and functions. Intonation in Daco-Romance was investigated only marginally, usually in mainstream Romanian grammar compendia, which doomed it to be a virtually unstudied area. Although there are several short descriptions of Romanian intonation (DascÄlu-Jinga 1971, 1998, 2001; Vasiliu 1965; ChiČoran, Pârlog and Augerot 1984; ChiČoran 2002) they were not conducted in any particular framework and were mainly impressionistic in character. It is apparent that a fresh comprehensive approach to intonation in Romanian and in Eastern Romance in general is needed as a basis for future pedagogical, typological, and comparative research. After a critical account of major intonation theories â the IPO theory, the âtraditional Britishâ system and the Autosegmental-Metrical (AM) theory â it is argued that the most suitable framework in which this project should be conducted is the AM theory. The main aim of the present thesis is to propose a comprehensive model for intonation in Romanian and the other Daco-Romance varieties based on the Autosegmental-Metrical theory (Pierrehumbert 1980, Ladd 2008 [1996], Gussenhoven 2004). This will involve the first Romanian ToBI (Ro-ToBI) transcription of intonation and show how focus is realised in the language. After providing an inventory of pitch accents and boundary tones, special attention is given to broad focus and narrow/contrastive focus in yes-no questions and wh-questions, which were reported to be peculiar in Romanian intonation compared with other (Western) Romance languages (Ladd 2008). For this purpose, 12 native speakers of all four Daco-Romance varieties were interviewed, which resulted in a spontaneous corpus (short conversations or short stories), and a semi-spontaneous corpus (questionnaires specially designed to elicit broad, narrow and contrastive focus, as well as other specific types of intonation). Acoustic analyses were performed in PRAAT followed by a comparative study of Daco-Romanian, Aromanian, Megleno-Romanian, and Istro-Romanian. In order to facilitate research and comparative studies across Romance languages, the data presented in this thesis was obtained using two intonation questionnaires based on the Discourse Completion Test (initially developed by Blum-Kulka et al. 1989) which 4 included some 31 situations designed to elicit a large number of specific sentence types and pragmatic meanings and eight different focus contexts. An analysis of the intonational phonology of Daco-Romance varieties suggests that they tend to align more with each other than with the non-Romance languages with which they are in contact. With respect to focus, the findings presented here suggest that the Nuclear Stress Rule (NSR) (Zubizarreta 1998; 2010) applies in Eastern Romance only to a certain extent in broad focus contexts, but not in narrow focus which allows contextual de-accenting. The results presented showed that Daco-Romance has a very rich and diverse intonational phonology as a bridge prosodic system between Slavic and Romance. The outcome of the project will not only have applications for automatic speech recognition (TTS systems) but will also help us to better understand intonational phonology in Romance in general
Intonation Modelling for Speech Synthesis and Emphasis Preservation
Speech-to-speech translation is a framework which recognises speech in an input language, translates it to a target language and synthesises speech in this target language. In such a system, variations in the speech signal which are inherent to natural human speech are lost, as the information goes through the different building blocks of the translation process. The work presented in this thesis addresses aspects of speech synthesis which are lost in traditional speech-to-speech translation approaches. The main research axis of this thesis is the study of prosody for speech synthesis and emphasis preservation. A first investigation of regional accents of spoken French is carried out to understand the sensitivity of native listeners with respect to accented speech synthesis. Listening tests show that standard adaptation methods for speech synthesis are not sufficient for listeners to perceive accentedness. On the other hand, combining adaptation with original prosody allows perception of accents. Addressing the need of a more suitable prosody model, a physiologically plausible intonation model is proposed. Inspired by the command-response model, it has basic components, which can be related to muscle responses to nerve impulses. These components are assumed to be a representation of muscle control of the vocal folds. A motivation for such a model is its theoretical language independence, based on the fact that humans share the same vocal apparatus. An automatic parameter extraction method which integrates a perceptually relevant measure is proposed with the model. This approach is evaluated and compared with the standard command-response model. Two corpora including sentences with emphasised words are presented, in the context of the SIWIS project. The first is a multilingual corpus with speech from multiple speaker; the second is a high quality speech synthesis oriented corpus from a professional speaker. Two broad uses of the model are evaluated. The first shows that it is difficult to predict model parameters; however the second shows that parameters can be transferred in the context of emphasis synthesis. A relation between model parameters and linguistic features such as stress and accent is demonstrated. Similar observations are made between the parameters and emphasis. Following, we investigate the extraction of atoms in emphasised speech and their transfer in neutral speech, which turns out to elicit emphasis perception. Using clustering methods, this is extended to the emphasis of other words, using linguistic context. This approach is validated by listening tests, in the case of English
Proyecto Docente e Investigador
PROYECTO DOCENTE E INVESTIGADOR
CatedrĂĄticos de Universidad
Ărea de Ciencia de la ComputaciĂłn e Inteligencia Artificial
Universidad de Valladolid
19 de Mayo de 2023
David Escudero Manceb
Recommended from our members
The production and perception of domain-initial strengthening in Seoul, Busan, and Ulsan Korean
Korean exhibits one of the most consistent examples of the cross-linguistic phenomenon of domain-initial strengthening (hereafter DIS; T. Cho & Keating, 2001; Keating, Cho, Fougeron, & Hsu, 2004). DIS is defined as temporal and/or spatial enhancement of segmental articulation in the initial position of prosodic domains. Broadly, this dissertation serves as a detailed case study of the production patterns and the perceptual benefits of this phenomenon.
The recent findings of denasalisation and devoicing of the initial nasals in Korean (Young Shin Kim, 2011; Yoo, 2015a) suggest that there is a striking parallelism between the lenis stops /p, t, k/ and the nasal consonants /m, n/ in their patterns of DIS. Nevertheless, we currently lack an account that captures this parallelism. In addition, there is disagreement over the categorical nature of lenis stop voicing (S.-A. Jun, 1993; Docherty, 1995) and denasalisation (Yoshida, 2008; Young Shin Kim, 2011). Despite the obvious similarities between the arguably discrete processes of lenis stop voicing and denasalisation, and the kind of gradient effects widely reported for DIS, there has been no explicit investigation of the links among them. Thus, I examined the hypothesis that DIS, operating in the phonetic component, has given rise to the categorical rules of lenis stop voicing and denasalisation in the phrase-level phonology through rule scattering, as predicted by the theory of the life cycle of phonological processes (BermĂşdez-Otero & Trousdale, 2012; Turton, 2014).
Recordings were collected in Seoul, Busan, and Ulsan, and various auditory and acoustic analyses were conducted to examine the phonetic variation of the relevant stops. The study adopted the three-city design as these varieties were expected to be at different stages in the life cycle, particularly with regard to the stabilisation of denasalisation. In the second part of this dissertation, I conducted a perception experiment to investigate if listeners are able to use DIS patterns as a cue to a prosodic boundary.
According to the results, Seoul showed the most advanced patterns in the stabilisation of DIS. As predicted by rule scattering, speakers who showed evidence of categorical lenis stop voicing and/or denasalisation also showed an overlaid effect of a gradient phonetic process. The perception study strongly supported the hypothesis that listeners exploit DIS cues to detect the beginning of a prosodic domain. Based on these findings, this dissertation offers a unified account of lenis stop voicing, denasalisation, and DIS within a single framework, offering insights into the nature of DIS as well as its functional role in prosodic parsing.Cambridge Trust International Scholarshi
European Approaches to Japanese Language and Linguistics
In this volume European specialists of Japanese language present new and original research into Japanese over
a wide spectrum of topics which include descriptive, sociolinguistic, pragmatic and didactic accounts. The articles share a focus on contemporary issues and adopt new approaches to the study of Japanese that often are specific to European traditions of language study. The articles address an audience that includes both Japanese Studies and Linguistics. They are representative of the wide range of topics that are currently studied in European universities, and they address scholars and students alike
A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn
The contributions in this Festschrift were written by Ockeâs current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ockeâs career, in connection with his teaching and/or his academic productions: âSegmentsâ, âPerception of Accentâ, âBetween Sounds and Graphemesâ, âProsodyâ, âMorphology and Syntaxâ and âSecond Language Acquisitionâ. Each one of these illustrates a sound approach to language matters
Fast Speech in Unit Selection Speech Synthesis
Moers-Prinz D. Fast Speech in Unit Selection Speech Synthesis. Bielefeld: Universität Bielefeld; 2020.Speech synthesis is part of the everyday life of many people with severe visual disabilities. For those who are reliant on assistive speech technology the possibility to choose a fast speaking rate is reported to be essential. But also expressive speech synthesis and other spoken language interfaces may require an integration of fast speech. Architectures like formant or diphone synthesis are able to produce synthetic speech at fast speech rates, but the generated speech does not sound very natural. Unit selection synthesis systems, however, are capable of delivering more natural output. Nevertheless, fast speech has not been adequately implemented into such systems to date. Thus, the goal of the work presented here was to determine an optimal strategy for modeling fast speech in unit selection speech synthesis to provide potential users with a more natural sounding alternative for fast speech output
The Perception of Emotion from Acoustic Cues in Natural Speech
Knowledge of human perception of emotional speech is imperative for the development of emotion in speech recognition systems and emotional speech synthesis. Owing to the fact that there is a growing trend towards research on spontaneous, real-life data, the aim of the present thesis is to examine human perception of emotion in naturalistic speech. Although there are many available emotional speech corpora, most contain simulated expressions. Therefore, there remains a compelling need to obtain naturalistic speech corpora that are appropriate and freely available for research. In that regard, our initial aim was to acquire suitable naturalistic material and examine its emotional content based on listener perceptions. A web-based listening tool was developed to accumulate ratings based on large-scale listening groups. The emotional content present in the speech material was demonstrated by performing perception tests on conveyed levels of Activation and Evaluation. As a result, labels were determined that signified the emotional content, and thus contribute to the construction of a naturalistic emotional speech corpus. In line with the literature, the ratings obtained from the perception tests suggested that Evaluation (or hedonic valence) is not identified as reliably as Activation is. Emotional valence can be conveyed through both semantic and prosodic information, for which the meaning of one may serve to facilitate, modify, or conflict with the meaning of the otherâparticularly with naturalistic speech. The subsequent experiments aimed to investigate this concept by comparing ratings from perception tests of non-verbal speech with verbal speech. The method used to render non-verbal speech was low-pass filtering, and for this, suitable filtering conditions were determined by carrying out preliminary perception tests. The results suggested that nonverbal naturalistic speech provides sufficiently discernible levels of Activation and Evaluation. It appears that the perception of Activation and Evaluation is affected by low-pass filtering, but that the effect is relatively small. Moreover, the results suggest that there is a similar trend in agreement levels between verbal and non-verbal speech. To date it still remains difficult to determine unique acoustical patterns for hedonic valence of emotion, which may be due to inadequate labels or the incorrect selection of acoustic parameters. This study has implications for the labelling of emotional speech data and the determination of salient acoustic correlates of emotion