2,961 research outputs found

    Neural Dynamics of Phonetic Trading Relations for Variable-Rate CV Syllables

    Full text link
    The perception of CV syllables exhibits a trading relationship between voice onset time (VOT) of a consonant and duration of a vowel. Percepts of [ba] and [wa] can, for example, depend on the durations of the consonant and vowel segments, with an increase in the duration of the subsequent vowel switching the percept of the preceding consonant from [w] to [b]. A neural model, called PHONET, is proposed to account for these findings. In the model, C and V inputs are filtered by parallel auditory streams that respond preferentially to transient and sustained properties of the acoustic signal, as in vision. These streams are represented by working memories that adjust their processing rates to cope with variable acoustic input rates. More rapid transient inputs can cause greater activation of the transient stream which, in turn, can automatically gain control the processing rate in the sustained stream. An invariant percept obtains when the relative activations of C and V representations in the two streams remain uncha.nged. The trading relation may be simulated as a result of how different experimental manipulations affect this ratio. It is suggested that the brain can use duration of a subsequent vowel to make the [b]/[w] distinction because the speech code is a resonant event that emerges between working mernory activation patterns and the nodes that categorize them.Advanced Research Projects Agency (90-0083); Air Force Office of Scientific Reseearch (F19620-92-J-0225); Pacific Sierra Research Corporation (91-6075-2

    Correlates of linguistic rhythm in the speech signal

    Get PDF
    Spoken languages have been classified by linguists according to their rhythmic properties, and psycholinguists have relied on this classification to account for infantsÂ’ capacity to discriminate languages. Although researchers have measured many speech signal properties, they have failed to identify reliable acoustic characteristics for language classes. This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support the notion of rhythm classes and also allow the simulation of infant language discrimination, consistent with the hypothesis that newborns rely on a coarse segmentation of speech. A hypothesis is proposed regarding the role of rhythm perception in language acquisition

    Exploring the prosodic and syntactic aspects of Mandarin-English Code switching

    Full text link
    L’alternance codique (Code-switching, CS) est l’un des comportements naturels les plus courants chez les bilingues. Les linguistes ont exploré les contraintes derrière l’alternance codique (CS) pour expliquer ce comportement. Au cours des dernières décennies, la recherche a plutôt été axée sur les contraintes syntaxiques et ce n’est que récemment que les contraintes prosodiques ont commencé à attirer l’attention des linguistes. Puisque la paire de langues choisie est moins étudiée dans le domaine de recherche sur la CS, les études sur la CS mandarin-anglais sont limitées en ce qui concerne les deux contraintes. Ainsi, cette étude explore à la fois les contraintes prosodiques et les schémas syntaxiques de cette paire de langues grâce à une base de données naturelle sur l’alternance codique. Prosodiquement, l’étude applique une approche fondée sur l’information (information-based approach) et utilise une unité fondamentale, l’unité d’intonation (Intonation Unit, IU), pour mener l’analyse. Le résultat de 10,6 % d’IU bilingue (BIU) se révèle fiable et offre des preuves solides que l’alternance codique a tendance à avoir lieu aux frontières de l’IU chez les bilingues. Les résultats soutiennent le travail précurseur de Shenk (2006) à partir d’une paire de langues inexplorée (mandarin-anglais). De plus, cette étude développe des solutions au problème de subjectivité et au problème d’adéquation de la base de données afin de renforcer la fiabilité des résultats. D’un point de vue syntaxique, l’étude examine les schémas syntaxiques aux points de CS de la paire de langues mandarin-anglais en utilisant des données recueillies auprès d’une communauté bilingue rarement étudiée. Un schéma syntaxique spécifique à cette paire de langues a été observé en fonction des résultats, mais l’étude suggère que ce schéma ait perturbé les résultats finaux. L’étude comporte une analyse avec les résultats de l’aspect prosodique et de l’aspect syntaxique. Lorsque les résultats divergents sont éliminés, on peut observer un résultat plus solide qui soutient davantage l’argument de la contrainte prosodique.Code-switching (CS) is one of the most common natural behaviors among bilinguals. Linguists have been exploring the constraints behind CS to explain this behaviour, and while syntactic constraints have been the focus for decades, prosodic constraints were only studied more in depth recently. As a less common language pair in CS research, studies on Mandarin-English CS are limited for both constraints. Thus, this study explores the prosodic constraints and syntactic patterns of this language pair with a natural CS database. Prosodically, this study applies the information-based approach and its fundamental unit, Intonation Unit (IU), to conduct the analysis. The result of 10.6% bilingual IU (BIU) proves to be reliable and offers solid evidence that bilinguals tend to code-switch at IU boundaries. This supports the pioneer work of Shenk (2006) from the unexplored Mandarin-English language pair. In addition to this, the study develops solutions to deal with the subjectivity problem and the database appropriateness problem in this approach to strengthen the validity of the results. Syntactically, this study investigates the syntactic patterns at switching points on the Mandarin-English language pair using data collected from a rarely investigated bilingual community. Based on the results, a syntactic pattern specific to this language pair was observed and this study suggests it disrupted the final results. This study conducts an analysis with the results of both the prosodic aspect and the syntactic aspect. When the interfering results are eliminated, a more solid outcome can be observed which provides greater support to the prosodic constraint argument

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Language identification with suprasegmental cues: A study based on speech resynthesis

    Get PDF
    This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    Implicit Self-supervised Language Representation for Spoken Language Diarization

    Full text link
    In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization (SD) literature, three frameworks based on (1) fixed segmentation, (2) change point-based segmentation and (3) E2E are proposed to perform LD. The initial exploration with synthetic TTSF-LD dataset shows, using x-vector as implicit language representation with appropriate analysis window length (NN) can able to achieve at per performance with explicit LD. The best implicit LD performance of 6.386.38 in terms of Jaccard error rate (JER) is achieved by using the E2E framework. However, considering the E2E framework the performance of implicit LD degrades to 60.460.4 while using with practical Microsoft CS (MSCS) dataset. The difference in performance is mostly due to the distributional difference between the monolingual segment duration of secondary language in the MSCS and TTSF-LD datasets. Moreover, to avoid segment smoothing, the smaller duration of the monolingual segment suggests the use of a small value of NN. At the same time with small NN, the x-vector representation is unable to capture the required language discrimination due to the acoustic similarity, as the same speaker is speaking both languages. Therefore, to resolve the issue a self-supervised implicit language representation is proposed in this study. In comparison with the x-vector representation, the proposed representation provides a relative improvement of 63.9%63.9\% and achieved a JER of 21.821.8 using the E2E framework.Comment: Planning to Submit in IEEE-JSTS

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
    • …
    corecore