6,332 research outputs found

    Children at risk : their phonemic awareness development in holistic instruction

    Get PDF
    Includes bibliographical references (p. 17-19

    Time and information in perceptual adaptation to speech

    Get PDF
    Presubmission manuscript and supplementary files (stimuli, stimulus presentation code, data, data analysis code).Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.NIH NIDCD (R03DC014045

    A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

    Get PDF
    We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

    Production and perception of speaker-specific phonetic detail at word boundaries

    Get PDF
    Experiments show that learning about familiar voices affects speech processing in many tasks. However, most studies focus on isolated phonemes or words and do not explore which phonetic properties are learned about or retained in memory. This work investigated inter-speaker phonetic variation involving word boundaries, and its perceptual consequences. A production experiment found significant variation in the extent to which speakers used a number of acoustic properties to distinguish junctural minimal pairs e.g. 'So he diced them'—'So he'd iced them'. A perception experiment then tested intelligibility in noise of the junctural minimal pairs before and after familiarisation with a particular voice. Subjects who heard the same voice during testing as during the familiarisation period showed significantly more improvement in identification of words and syllable constituents around word boundaries than those who heard different voices. These data support the view that perceptual learning about the particular pronunciations associated with individual speakers helps listeners to identify syllabic structure and the location of word boundaries

    Why not model spoken word recognition instead of phoneme monitoring?

    Get PDF
    Norris, McQueen & Cutler present a detailed account of the decision stage of the phoneme monitoring task. However, we question whether this contributes to our understanding of the speech recognition process itself, and we fail to see why phonotactic knowledge is playing a role in phoneme recognition.

    Aptitude, experience and second language pronunciation proficiency development in classroom settings: a longitudinal study

    Get PDF
    The current study longitudinally examined the influence of aptitude on second language (L2) pronunciation development when 40 first-year Japanese university students engaged in practice activities inside and outside English-as-a-Foreign-Language classrooms over one academic year. Spontaneous speech samples were elicited at the beginning, middle and end points of the project, analyzed for global, segmental, syllabic, prosodic and temporal aspects of L2 pronunciation, and linked to their aptitude and experience profiles. Results indicated that the participants generally enhanced the global comprehensibility of their speech (via reducing vowel insertion errors in complex syllables) as a function of increased classroom experience during their first semester, and explicit learning aptitude (associative memory, phonemic coding) appeared to help certain learners further enhance their pronunciation proficiency through the development of fluency and prosody. In the second semester, incidental learning ability (sound sequence recognition) was shown to be a significant predictor of the extent to which certain learners continued to improve and ultimately attain advanced-level L2 comprehensibility, largely thanks to improved segmental accuracy

    Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

    Get PDF
    We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

    Listeners normalize speech for contextual speech rate even without an explicit recognition task

    No full text
    Speech can be produced at different rates. Listeners take this rate variation into account by normalizing vowel duration for contextual speech rate: An ambiguous Dutch word /m?t/ is perceived as short /mAt/ when embedded in a slow context, but long /ma:t/ in a fast context. Whilst some have argued that this rate normalization involves low-level automatic perceptual processing, there is also evidence that it arises at higher-level cognitive processing stages, such as decision making. Prior research on rate-dependent speech perception has only used explicit recognition tasks to investigate the phenomenon, involving both perceptual processing and decision making. This study tested whether speech rate normalization can be observed without explicit decision making, using a cross-modal repetition priming paradigm. Results show that a fast precursor sentence makes an embedded ambiguous prime (/m?t/) sound (implicitly) more /a:/-like, facilitating lexical access to the long target word "maat" in a (explicit) lexical decision task. This result suggests that rate normalization is automatic, taking place even in the absence of an explicit recognition task. Thus, rate normalization is placed within the realm of everyday spoken conversation, where explicit categorization of ambiguous sounds is rare

    Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

    Get PDF
    We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse representation of the test posteriors using this dictionary enables projection to the space of training data. Relying on the fact that the intrinsic dimensions of the posterior subspaces are indeed very small and the matrix of all posteriors belonging to a class has a very low rank, we demonstrate how low-dimensional structures enable further enhancement of the posteriors and rectify the spurious errors due to mismatch conditions. The enhanced acoustic modeling method leads to improvements in continuous speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in both clean and noisy conditions, where upto 15.4% relative reduction in word error rate (WER) is achieved

    Strategies for Representing Tone in African Writing Systems

    Get PDF
    Tone languages provide some interesting challenges for the designers of new orthographies. One approach is to omit tone marks, just as stress is not marked in English (zero marking). Another approach is to do phonemic tone analysis and then make heavy use of diacritic symbols to distinguish the `tonemes' (exhaustive marking). While orthographies based on either system have been successful, this may be thanks to our ability to manage inadequate orthographies rather than to any intrinsic advantage which is afforded by one or the other approach. In many cases, practical experience with both kinds of orthography in sub-Saharan Africa has shown that people have not been able to attain the level of reading and writing fluency that we know to be possible for the orthographies of non-tonal languages. In some cases this can be attributed to a sociolinguistic setting which does not favour vernacular literacy. In other cases, the orthography itself might be to blame. If the orthography of a tone language is difficult to user or to learn, then a good part of the reason, I believe, is that the designer either has not paid enough attention to the function of tone in the language, or has not ensured that the information encoded in the orthography is accessible to the ordinary (non-linguist) user of the language. If the writing of tone is not going to continue to be a stumbling block to literacy efforts, then a fresh approach to tone orthography is required, one which assigns high priority to these two factors. This article describes the problems with orthographies that use too few or too many tone marks, and critically evaluates a wide range of creative intermediate solutions. I review the contributions made by phonology and reading theory, and provide some broad methodological principles to guide someone who is seeking to represent tone in a writing system. The tone orthographies of several languages from sub-Saharan Africa are presented throughout the article, with particular emphasis on some tone languages of Cameroon
    corecore