714 research outputs found

    An introduction to statistical parametric speech synthesis

    Get PDF

    Categories, words and rules in language acquisition

    Get PDF
    Acquiring language requires learning a set of words (i.e. the lexicon) and abstract rules that combine them to form sentences (i.e. syntax). In this thesis, we show that infants acquiring their mother tongue rely on different speech categories to extract: words and to abstract regularities. We address this issue with a study that investigates how young infants use consonants and vowels, showing that certain computations are tuned to one or the other of these speech categories..

    Do 11-month-old French infants process articles?

    Get PDF
    pre-print: Ă  paraĂźtre dansLanguage and SpeechThe first part of this study examined (Parisian) French-learning 11-month-old infants' recognition of the six definite and indefinite French articles: le, la, les, un, une, des. The six articles were compared with pseudo articles in the context of disyllabic or monosyllabic nouns, using the Head-turn Preference Procedure. The pseudo articles were similar to real articles in terms of phonetic composition and phonotactic probability, and real and pseudo noun phrases were alike in terms of overall prosodic contour. In three experiments, 11-month-old infants showed preference for real over pseudo articles, suggesting they have the articles' word-forms stored in long-term memory. The second part of the study evaluates several hypotheses about the role of articles in 11-month-olds infants' word recognition. Evidence from three experiments support the view that articles help infants to recognize the following words. We propose that 11-month-olds have the capacity to parse noun phrases into their constituents, which is consistent with the more general view that function words define a syntactic skeleton that serves as a basis for parsing spoken utterances. This proposition is compared to a competing account, which argues that 11-month-olds recognize noun-phrases as whole-words

    Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?

    Full text link
    Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily. Although Neural Machine Translation (NMT) can speed up the process of translating audiovisual content, large manual effort is still required for transcribing the source language, and for spotting and segmenting the text into proper subtitles. Creating proper subtitles in terms of timing and segmentation highly depends on information present in the audio (utterance duration, natural pauses). In this work, we explore two methods for applying Speech Translation (ST) to subtitling: a) a direct end-to-end and b) a classical cascade approach. We discuss the benefit of having access to the source language speech for improving the conformity of the generated subtitles to the spatial and temporal subtitling constraints and show that length is not the answer to everything in the case of subtitling-oriented ST.Comment: Accepted at IWSLT 202

    P-model Alternative to the T-model

    Get PDF
    Standard linguistic analysis of syntax uses the T-model. This model requires the ordering: D-structure >> S-structure >> LF, where D-structure is the deep structure, S-structure is the surface structure, and LF is logical form. Between each of these representations there is movement which alters the order of the constituent words; movement is achieved using the principles and parameters of syntactic theory. Psychological analysis of sentence production is usually either serial or connectionist. Psychological serial models do not accommodate the T-model immediately so that here a new model called the P-model is introduced. The P-model is different from previous linguistic and psychological models. Here it is argued that the LF representation should be replaced by a variant of Frege's three qualities (sense, reference, and force), called the Frege representation or F-representation. In the F-representation the order of elements is not necessarily the same as that in LF and it is suggested that the correct ordering is: F-representation >> D-structure >> S-structure. This ordering appears to lead to a more natural view of sentence production and processing. Within this framework movement originates as the outcome of emphasis applied to the sentence. The requirement that the F-representation precedes the D-structure needs a picture of the particular principles and parameters which pertain to movement of words between representations. In general this would imply that there is a preferred or optimal ordering of the symbolic string in the F-representation. The standard ordering is retained because the general way of producing such an optimal ordering is unclear. In this case it is possible to produce an analysis of movement between LF and D-structure similar to the usual analysis of movement between S-structure and LF. It is suggested that a maximal amount of information about a language's grammar and lexicon is stored, because of the necessity of analyzing corrupted data

    Punctuation Prediction for Norwegian: Using Established Approaches for Under-Resourced Languages

    Get PDF
    MasteroppgÄve i informasjonsvitskapINFO390MASV-INF

    Universal and language-specific processing : the case of prosody

    Get PDF
    A key question in the science of language is how speech processing can be influenced by both language-universal and language-specific mechanisms (Cutler, Klein, & Levinson, 2005). My graduate research aimed to address this question by adopting a crosslanguage approach to compare languages with different phonological systems. Of all components of linguistic structure, prosody is often considered to be one of the most language-specific dimensions of speech. This can have significant implications for our understanding of language use, because much of speech processing is specifically tailored to the structure and requirements of the native language. However, it is still unclear whether prosody may also play a universal role across languages, and very little comparative attempts have been made to explore this possibility. In this thesis, I examined both the production and perception of prosodic cues to prominence and phrasing in native speakers of English and Mandarin Chinese. In focus production, our research revealed that English and Mandarin speakers were alike in how they used prosody to encode prominence, but there were also systematic language-specific differences in the exact degree to which they enhanced the different prosodic cues (Chapter 2). This, however, was not the case in focus perception, where English and Mandarin listeners were alike in the degree to which they used prosody to predict upcoming prominence, even though the precise cues in the preceding prosody could differ (Chapter 3). Further experiments examining prosodic focus prediction in the speech of different talkers have demonstrated functional cue equivalence in prosodic focus detection (Chapter 4). Likewise, our experiments have also revealed both crosslanguage similarities and differences in the production and perception of juncture cues (Chapter 5). Overall, prosodic processing is the result of a complex but subtle interplay of universal and language-specific structure

    Robust Estimation of Tone Break Indices from Speech Signal using Multi-Scale Analysis and their Applications

    Get PDF
    The aim of this study is to develop robust algorithm to automatically detect the Tone and Break Indices(ToBI) from the speech signal and explore their applications. iLAST was introduced to analyze the acoustic and prosodic features to detect the ToBI indices. Both expert and data driven rules were used to improve the robustness. The integration of multi-scale signal analysis with rule-based classification has helped in robustly identifying tones that can be used in applications, such as identifying Vowel triangle, emotions from speech etc. Empirical analyses using labeled dataset were performed to illustrate the utility of the proposed approach. Further analyses were conducted to identify the inefficiencies with the proposed approach and address those issues through co-analyses of prosodic features in identifying the major contributors to robust detection of ToBI. It was demonstrated that the proposed approach performs robustly and can be used for developing a wide variety of applications
    • 

    corecore