17 research outputs found

    The Intonational Phonology of Daco-Romance

    Get PDF
    Within the field of Romance phonetics and phonology, the intonation of the Daco- Romance languages (Romanian, Aromanian, Megleno-Romanian and Istro-Romanian) has been a much-neglected topic. In fact, until relatively recently, little was known about the general importance of intonation in speech and about its forms and functions. Intonation in Daco-Romance was investigated only marginally, usually in mainstream Romanian grammar compendia, which doomed it to be a virtually unstudied area. Although there are several short descriptions of Romanian intonation (Dascălu-Jinga 1971, 1998, 2001; Vasiliu 1965; Chițoran, Pârlog and Augerot 1984; Chițoran 2002) they were not conducted in any particular framework and were mainly impressionistic in character. It is apparent that a fresh comprehensive approach to intonation in Romanian and in Eastern Romance in general is needed as a basis for future pedagogical, typological, and comparative research. After a critical account of major intonation theories – the IPO theory, the ‘traditional British’ system and the Autosegmental-Metrical (AM) theory – it is argued that the most suitable framework in which this project should be conducted is the AM theory. The main aim of the present thesis is to propose a comprehensive model for intonation in Romanian and the other Daco-Romance varieties based on the Autosegmental-Metrical theory (Pierrehumbert 1980, Ladd 2008 [1996], Gussenhoven 2004). This will involve the first Romanian ToBI (Ro-ToBI) transcription of intonation and show how focus is realised in the language. After providing an inventory of pitch accents and boundary tones, special attention is given to broad focus and narrow/contrastive focus in yes-no questions and wh-questions, which were reported to be peculiar in Romanian intonation compared with other (Western) Romance languages (Ladd 2008). For this purpose, 12 native speakers of all four Daco-Romance varieties were interviewed, which resulted in a spontaneous corpus (short conversations or short stories), and a semi-spontaneous corpus (questionnaires specially designed to elicit broad, narrow and contrastive focus, as well as other specific types of intonation). Acoustic analyses were performed in PRAAT followed by a comparative study of Daco-Romanian, Aromanian, Megleno-Romanian, and Istro-Romanian. In order to facilitate research and comparative studies across Romance languages, the data presented in this thesis was obtained using two intonation questionnaires based on the Discourse Completion Test (initially developed by Blum-Kulka et al. 1989) which 4 included some 31 situations designed to elicit a large number of specific sentence types and pragmatic meanings and eight different focus contexts. An analysis of the intonational phonology of Daco-Romance varieties suggests that they tend to align more with each other than with the non-Romance languages with which they are in contact. With respect to focus, the findings presented here suggest that the Nuclear Stress Rule (NSR) (Zubizarreta 1998; 2010) applies in Eastern Romance only to a certain extent in broad focus contexts, but not in narrow focus which allows contextual de-accenting. The results presented showed that Daco-Romance has a very rich and diverse intonational phonology as a bridge prosodic system between Slavic and Romance. The outcome of the project will not only have applications for automatic speech recognition (TTS systems) but will also help us to better understand intonational phonology in Romance in general

    Intonation Modelling for Speech Synthesis and Emphasis Preservation

    Get PDF
    Speech-to-speech translation is a framework which recognises speech in an input language, translates it to a target language and synthesises speech in this target language. In such a system, variations in the speech signal which are inherent to natural human speech are lost, as the information goes through the different building blocks of the translation process. The work presented in this thesis addresses aspects of speech synthesis which are lost in traditional speech-to-speech translation approaches. The main research axis of this thesis is the study of prosody for speech synthesis and emphasis preservation. A first investigation of regional accents of spoken French is carried out to understand the sensitivity of native listeners with respect to accented speech synthesis. Listening tests show that standard adaptation methods for speech synthesis are not sufficient for listeners to perceive accentedness. On the other hand, combining adaptation with original prosody allows perception of accents. Addressing the need of a more suitable prosody model, a physiologically plausible intonation model is proposed. Inspired by the command-response model, it has basic components, which can be related to muscle responses to nerve impulses. These components are assumed to be a representation of muscle control of the vocal folds. A motivation for such a model is its theoretical language independence, based on the fact that humans share the same vocal apparatus. An automatic parameter extraction method which integrates a perceptually relevant measure is proposed with the model. This approach is evaluated and compared with the standard command-response model. Two corpora including sentences with emphasised words are presented, in the context of the SIWIS project. The first is a multilingual corpus with speech from multiple speaker; the second is a high quality speech synthesis oriented corpus from a professional speaker. Two broad uses of the model are evaluated. The first shows that it is difficult to predict model parameters; however the second shows that parameters can be transferred in the context of emphasis synthesis. A relation between model parameters and linguistic features such as stress and accent is demonstrated. Similar observations are made between the parameters and emphasis. Following, we investigate the extraction of atoms in emphasised speech and their transfer in neutral speech, which turns out to elicit emphasis perception. Using clustering methods, this is extended to the emphasis of other words, using linguistic context. This approach is validated by listening tests, in the case of English

    Proyecto Docente e Investigador

    Get PDF
    PROYECTO DOCENTE E INVESTIGADOR Catedråticos de Universidad Área de Ciencia de la Computación e Inteligencia Artificial Universidad de Valladolid 19 de Mayo de 2023 David Escudero Manceb

    European Approaches to Japanese Language and Linguistics

    Get PDF
    In this volume European specialists of Japanese language present new and original research into Japanese over a wide spectrum of topics which include descriptive, sociolinguistic, pragmatic and didactic accounts. The articles share a focus on contemporary issues and adopt new approaches to the study of Japanese that often are specific to European traditions of language study. The articles address an audience that includes both Japanese Studies and Linguistics. They are representative of the wide range of topics that are currently studied in European universities, and they address scholars and students alike

    Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum

    Get PDF

    A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn

    Get PDF
    The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters

    Fast Speech in Unit Selection Speech Synthesis

    Get PDF
    Moers-Prinz D. Fast Speech in Unit Selection Speech Synthesis. Bielefeld: Universität Bielefeld; 2020.Speech synthesis is part of the everyday life of many people with severe visual disabilities. For those who are reliant on assistive speech technology the possibility to choose a fast speaking rate is reported to be essential. But also expressive speech synthesis and other spoken language interfaces may require an integration of fast speech. Architectures like formant or diphone synthesis are able to produce synthetic speech at fast speech rates, but the generated speech does not sound very natural. Unit selection synthesis systems, however, are capable of delivering more natural output. Nevertheless, fast speech has not been adequately implemented into such systems to date. Thus, the goal of the work presented here was to determine an optimal strategy for modeling fast speech in unit selection speech synthesis to provide potential users with a more natural sounding alternative for fast speech output

    The Perception of Emotion from Acoustic Cues in Natural Speech

    Get PDF
    Knowledge of human perception of emotional speech is imperative for the development of emotion in speech recognition systems and emotional speech synthesis. Owing to the fact that there is a growing trend towards research on spontaneous, real-life data, the aim of the present thesis is to examine human perception of emotion in naturalistic speech. Although there are many available emotional speech corpora, most contain simulated expressions. Therefore, there remains a compelling need to obtain naturalistic speech corpora that are appropriate and freely available for research. In that regard, our initial aim was to acquire suitable naturalistic material and examine its emotional content based on listener perceptions. A web-based listening tool was developed to accumulate ratings based on large-scale listening groups. The emotional content present in the speech material was demonstrated by performing perception tests on conveyed levels of Activation and Evaluation. As a result, labels were determined that signified the emotional content, and thus contribute to the construction of a naturalistic emotional speech corpus. In line with the literature, the ratings obtained from the perception tests suggested that Evaluation (or hedonic valence) is not identified as reliably as Activation is. Emotional valence can be conveyed through both semantic and prosodic information, for which the meaning of one may serve to facilitate, modify, or conflict with the meaning of the other—particularly with naturalistic speech. The subsequent experiments aimed to investigate this concept by comparing ratings from perception tests of non-verbal speech with verbal speech. The method used to render non-verbal speech was low-pass filtering, and for this, suitable filtering conditions were determined by carrying out preliminary perception tests. The results suggested that nonverbal naturalistic speech provides sufficiently discernible levels of Activation and Evaluation. It appears that the perception of Activation and Evaluation is affected by low-pass filtering, but that the effect is relatively small. Moreover, the results suggest that there is a similar trend in agreement levels between verbal and non-verbal speech. To date it still remains difficult to determine unique acoustical patterns for hedonic valence of emotion, which may be due to inadequate labels or the incorrect selection of acoustic parameters. This study has implications for the labelling of emotional speech data and the determination of salient acoustic correlates of emotion
    corecore