Search CORE

714 research outputs found

An introduction to statistical parametric speech synthesis

Author: King Simon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2011
Field of study

Categories, words and rules in language acquisition

Author: Hochmann Jean Remy
Publication venue: place:Trieste
Publication date: 06/12/2010
Field of study

Acquiring language requires learning a set of words (i.e. the lexicon) and abstract rules that combine them to form sentences (i.e. syntax). In this thesis, we show that infants acquiring their mother tongue rely on different speech categories to extract: words and to abstract regularities. We address this issue with a study that investigates how young infants use consonants and vowels, showing that certain computations are tuned to one or the other of these speech categories..

Sissa Digital Library

Do 11-month-old French infants process articles?

Author: de Boysson-Bardies Bénédicte
Durand Catherine
Hallé Pierre
Publication venue: HAL CCSD
Publication date: 08/02/2007
Field of study

pre-print: à paraître dansLanguage and SpeechThe first part of this study examined (Parisian) French-learning 11-month-old infants' recognition of the six definite and indefinite French articles: le, la, les, un, une, des. The six articles were compared with pseudo articles in the context of disyllabic or monosyllabic nouns, using the Head-turn Preference Procedure. The pseudo articles were similar to real articles in terms of phonetic composition and phonotactic probability, and real and pseudo noun phrases were alike in terms of overall prosodic contour. In three experiments, 11-month-old infants showed preference for real over pseudo articles, suggesting they have the articles' word-forms stored in long-term memory. The second part of the study evaluates several hypotheses about the role of articles in 11-month-olds infants' word recognition. Evidence from three experiments support the view that articles help infants to recognize the following words. We propose that 11-month-olds have the capacity to parse noun phrases into their constituents, which is consistent with the more general view that function words define a syntactic skeleton that serves as a basis for parsing spoken utterances. This proposition is compared to a competing account, which argues that 11-month-olds recognize noun-phrases as whole-words

HAL Descartes

Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?

Author: Karakanta Alina
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2020
Field of study

Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily. Although Neural Machine Translation (NMT) can speed up the process of translating audiovisual content, large manual effort is still required for transcribing the source language, and for spotting and segmenting the text into proper subtitles. Creating proper subtitles in terms of timing and segmentation highly depends on information present in the audio (utterance duration, natural pauses). In this work, we explore two methods for applying Speech Translation (ST) to subtitling: a) a direct end-to-end and b) a classical cascade approach. We discuss the benefit of having access to the source language speech for improving the conformity of the generated subtitles to the spatial and temporal subtitling constraints and show that length is not the answer to everything in the case of subtitling-oriented ST.Comment: Accepted at IWSLT 202

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

P-model Alternative to the T-model

Author: Roberts Mark D.
Publication venue
Publication date: 01/01/2004
Field of study

Standard linguistic analysis of syntax uses the T-model. This model requires the ordering: D-structure

>

S-structure

>

LF, where D-structure is the deep structure, S-structure is the surface structure, and LF is logical form. Between each of these representations there is movement which alters the order of the constituent words; movement is achieved using the principles and parameters of syntactic theory. Psychological analysis of sentence production is usually either serial or connectionist. Psychological serial models do not accommodate the T-model immediately so that here a new model called the P-model is introduced. The P-model is different from previous linguistic and psychological models. Here it is argued that the LF representation should be replaced by a variant of Frege's three qualities (sense, reference, and force), called the Frege representation or F-representation. In the F-representation the order of elements is not necessarily the same as that in LF and it is suggested that the correct ordering is: F-representation

>

D-structure

>

S-structure. This ordering appears to lead to a more natural view of sentence production and processing. Within this framework movement originates as the outcome of emphasis applied to the sentence. The requirement that the F-representation precedes the D-structure needs a picture of the particular principles and parameters which pertain to movement of words between representations. In general this would imply that there is a preferred or optimal ordering of the symbolic string in the F-representation. The standard ordering is retained because the general way of producing such an optimal ordering is unclear. In this case it is possible to produce an analysis of movement between LF and D-structure similar to the usual analysis of movement between S-structure and LF. It is suggested that a maximal amount of information about a language's grammar and lexicon is stored, because of the necessity of analyzing corrupted data

PhilPapers

CogPrints Cognitive Sciences Eprint Archive

Punctuation Prediction for Norwegian: Using Established Approaches for Under-Resourced Languages

Author: Prestegard Guro Sivertsen
Publication venue: The University of Bergen
Publication date: 01/01/2021
Field of study

Masteroppgåve i informasjonsvitskapINFO390MASV-INF

University of Bergen

NORA - Norwegian Open Research Archives

Universal and language-specific processing : the case of prosody

Author: Ip Martin Ho Kwan
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2019
Field of study

A key question in the science of language is how speech processing can be influenced by both language-universal and language-specific mechanisms (Cutler, Klein, & Levinson, 2005). My graduate research aimed to address this question by adopting a crosslanguage approach to compare languages with different phonological systems. Of all components of linguistic structure, prosody is often considered to be one of the most language-specific dimensions of speech. This can have significant implications for our understanding of language use, because much of speech processing is specifically tailored to the structure and requirements of the native language. However, it is still unclear whether prosody may also play a universal role across languages, and very little comparative attempts have been made to explore this possibility. In this thesis, I examined both the production and perception of prosodic cues to prominence and phrasing in native speakers of English and Mandarin Chinese. In focus production, our research revealed that English and Mandarin speakers were alike in how they used prosody to encode prominence, but there were also systematic language-specific differences in the exact degree to which they enhanced the different prosodic cues (Chapter 2). This, however, was not the case in focus perception, where English and Mandarin listeners were alike in the degree to which they used prosody to predict upcoming prominence, even though the precise cues in the preceding prosody could differ (Chapter 3). Further experiments examining prosodic focus prediction in the speech of different talkers have demonstrated functional cue equivalence in prosodic focus detection (Chapter 4). Likewise, our experiments have also revealed both crosslanguage similarities and differences in the production and perception of juncture cues (Chapter 5). Overall, prosodic processing is the result of a complex but subtle interplay of universal and language-specific structure

Western Sydney ResearchDirect

Robust Estimation of Tone Break Indices from Speech Signal using Multi-Scale Analysis and their Applications

Author: Kolli Chandra Sekhar Rao
Publication venue: University of Memphis Digital Commons
Publication date: 19/07/2012
Field of study

The aim of this study is to develop robust algorithm to automatically detect the Tone and Break Indices(ToBI) from the speech signal and explore their applications. iLAST was introduced to analyze the acoustic and prosodic features to detect the ToBI indices. Both expert and data driven rules were used to improve the robustness. The integration of multi-scale signal analysis with rule-based classification has helped in robustly identifying tones that can be used in applications, such as identifying Vowel triangle, emotions from speech etc. Empirical analyses using labeled dataset were performed to illustrate the utility of the proposed approach. Further analyses were conducted to identify the inefficiencies with the proposed approach and address those issues through co-analyses of prosodic features in identifying the major contributors to robust detection of ToBI. It was demonstrated that the proposed approach performs robustly and can be used for developing a wide variety of applications

University of Memphis Digital Commons