329 research outputs found

    A simple statistical speech recognition of mandarin monosyllables

    Get PDF
    Abstract Each mandarin syllable is represented by a sequence of vectors of linear predict coding cepstra (LPCC). Since all syllables have a simple phonetic structure, in our speech recognition, we partition the sequence of LPCC vectors of all syllables into equal segments and average the LPCC vectors in each segment. The mean vector of LPCC is used as the feature of a syllable. Our simple feature does not need any time consuming and complicated nonlinear contraction and expansion as adopted by the dynamic time-warping. We propose several probability distributions for the feature values. A simplified Bayes decision rule is used for classification of mandarin syllables. For the speaker-independent mandarin digits, the recognition rate is 98.6% if a normal distribution is used for feature values and the rate is 98.1% if an exponential distribution is used for the absolute values of the features. The feature proposed in this paper to represent a syllable is the simplest one, much easier to be extracted than any other known features. The computation for feature extraction and classification is much faster and more accurate than using the HMM method or any other known techniques

    Effects of variance and input distribution on the training of L2 learners' tone categorization

    Get PDF
    Recent psycholinguistic findings showed that (a) a multi-modal phonetic training paradigm that encodes visual, interactive information is more effective in training L2 learners' perception of novel categories, (b) decreasing the acoustic variance of a phonetic dimension allows the learners to more effectively shift the perceptual weight towards this dimension, and (c) using an implicit word learning task in which the words are contrasted with different lexical tones improves naïve listeners' categorization of Mandarin Chinese tones. This dissertation investigates the effectiveness of video game training, variance manipulation and high variability training in the context of implicit word learning, in which American English speakers without any tone language experience learn four Mandarin Chinese tones by playing a video game. A video game was created in which each of four different animals is associated with a Chinese tone. The task for the participants is to select each animal's favorite food to feed it. At the beginning of the game, each animal is clearly visible. As the game progresses, the images of the animals become more and more vague and eventually visually indistinguishable. However, the four Chinese tones associated with the animals are played all through the game. Thus, the participants need to depend on the auditory information in order to clear the difficult levels. In terms of the training stimuli, the tone tokens were manipulated to have a greater variance on the pitch height dimension, but a smaller variance on the pitch direction dimension, in order to shift the English listeners' perception to pitch direction, a dimension that native Chinese listeners crucially rely on. A variety of pretests and posttests were used to investigate both the English speakers' perception of the tones and their weighting of the acoustic dimensions. These training stimuli were compared to other types of training stimuli used in the literature, such as the high variability natural stimuli and tones embedded in non-minimal pairs. A group of native English speakers was used as the control group without any tone input. A native control group was also included. The video game training for each speaker consisted of four 30-minute sessions on four different days, and 60 participants (including both the non-native control and native control group) participated in the experiments. The crucial findings in the study include (1) all naïve listeners in the training condition successfully associated lexical tones with different animals without any explicit feedback after only 2 hours of training; (2) both the resynthesized stimuli with smaller variance on pitch direction and the multi-talker stimuli allowed native English speakers to shift their cue-weighting toward pitch direction and the multi-talker stimuli were more robust in terms of shifting the cue-weighting despite their more heterogeneous distribution in the acoustic space; (3) the multi-talker training allowed for better generalization as the trainees in multi-talker training identified the tones produced by new talkers better than trainees in other conditions; (4) there was a main effect of tone on tone identification and the falling tone was the most challenging one; (5) there is a correlation between cue-weighting and the tone discrimination performance before and after the training; (6) due to individual variability, individuals differed in terms of the amount of tone input they received during the video game training and the number of tone tokens was a significant predictor for the sensitivity to tones calculated as d'. Overall, the study showed an effect of talker variability and variances of multidimensional acoustic space on English speakers' cue-weighting for tone perception and their tone categorization

    Characteristics of Speech (Part 1) and Language (Part 2) for Hearing Devices (Aids)

    Get PDF

    Lexical Effects in Phonemic Neutralization in Taiwan Mandarin

    Get PDF
    BLS 38: General Session and Thematic Session on Language Contac

    Neural correlates of segmental and tonal information in speech perception

    Get PDF
    The Chinese language provides an optimal window for investigating both segmental and suprasegmental units. The aim of this cross‐linguistic fMRI study is to elucidate neural mechanisms involved in extraction of Chinese consonants, rhymes, and tones from syllable pairs that are distinguished by only one phonetic feature (minimal) vs. those that are distinguished by two or more phonetic features (non‐minimal). Triplets of Chinese monosyllables were constructed for three tasks comparing consonants, rhymes, and tones. Each triplet consisted of two target syllables with an intervening distracter. Ten Chinese and English subjects were asked to selectively attend to targeted sub‐syllabic components and make same‐different judgments. Direct between‐group comparisons in both minimal and non‐minimal pairs reveal increased activation for the Chinese group in predominantly left‐sided frontal, parietal, and temporal regions. Within‐group comparisons of non‐minimal and minimal pairs show that frontal and parietal activity varies for each sub‐syllabic component. In the frontal lobe, the Chinese group shows bilateral activation of the anterior middle frontal gyrus (MFG) for rhymes and tones only. Within‐group comparisons of consonants, rhymes, and tones show that rhymes induce greater activation in the left posterior MFG for the Chinese group when compared to consonants and tones in non‐minimal pairs. These findings collectively support the notion of a widely distributed cortical network underlying different aspects of phonological processing. This neural network is sensitive to the phonological structure of a listener's native language

    SECOND LANGUAGE LEXICAL REPRESENTATION AND PROCESSING OF MANDARIN CHINESE TONES

    Get PDF
    This dissertation investigates second language (L2) speech learning challenges by testing advanced L2 Mandarin Chinese learners’ tone and word knowledge. We consider L2 speech learning under the scope of three general hypotheses. (1) The Tone Perception Hypothesis: Tones may be difficult for L2 listeners to perceive auditorily. (2) The Tone Representation Hypothesis: Tones may be difficult for L2 listeners to represent effectively. (3) The Tone Processing Hypothesis: Tones may be difficult for L2 listeners to process efficiently. Experiments 1 and 2 test tone perception and representation using tone identification tasks with monosyllabic and disyllabic stimuli with L1 and advanced L2 Mandarin listeners. Results suggest that both groups are highly accurate in identification of tones on isolated monosyllables; however, L2 learners have some difficulty in disyllabic contexts. This suggests that low-level auditory perception of tones presents L2 learners with persistent long-term challenges. Results also shed light on tone representations, showing that both L1 and L2 listeners are able to form abstract representations of third tone allotones. Experiments 3 and 4 test tone representation and processing through the use of online (behavioral and ERP) and offline measures of tone word recognition. Offline results suggest weaknesses in L2 learners’ long-term memory of tones for specific vocabulary. However, even when we consider only trials for which learners had correct and confident explicit knowledge of tones and words, we still see significant differences in accuracy for rejection of tone compared to vowel nonwords in lexical recognition tasks. Using a lexical decision task, ERP measures in Experiment 3 reveal consistent L1 sensitivity to tones and vowels in isolated word recognition, and individual differences among L2 listeners. While some are sensitive to both tone and vowel mismatches, others are only sensitive to vowels or not at all. Experiment 4 utilized picture cues to test neural responses tied directly to tone and vowel mismatches. Results suggest strong L1 sensitivity to vowel mismatches. No other significant results were found. The final chapter considers how the three hypotheses shed light on the results as a whole, and how they relate to the broader context of L2 speech learning

    Observing the contribution of both underlying and surface representations: Evidence from priming and event-related potentials

    Get PDF
    This dissertation aims to uncover the role of the acoustic input (the surface representation) and the abstract linguistic representation (the underlying representation) as listeners map the signal during spoken word recognition. To examine these issues, tone sandhi, a tonal alternation phenomenon in which a tone changes to a different tone in certain phonological environments, is investigated. This dissertation first examined how productive Mandarin tone 3 sandhi words (T3 → T2/___T3) are processed and represented. An auditory priming lexical decision experiment was conducted in which each disyllabic tone 3 sandhi target was preceded by a tone 2 monosyllable (surface-tone overlap), a tone 3 monosyllable (underlying-tone overlap), or an unrelated monosyllable (unrelated control). Lexical decision RTs showed a tone 3 (underlying-tone overlap) facilitation effect for both high and low frequency words. A second priming study investigated the processing and representation of the more complex and less productive Taiwanese tone sandhi. Lexical decision RTs, examining sandhi 24 → 33 and 51 → 55, showed that while both sandhi types exhibited facilitatory priming effects, underlying tone primes showed significantly more facilitation than surface primes for sandhi 24 → 33, while surface tone primes showed significantly more facilitation than underlying primes for sandhi 51 → 55, with both effects modulated by frequency. A third study used event-related potentials (ERPs) to examine Mandarin tone 3 sandhi. Using an oddball paradigm, participants passively listened to either Tone 2 standards ([tʂu2 je4] /tʂu2 je4/), Tone 3 standards ([tʂu3 je4] /tʂu3 je4/), Tone Sandhi standards ([tʂu2 jen3] /tʂu3 jen3/), or Mix standards (i.e., both tone 3 sandhi and tone 3 words), occasionally interspersed with a tone 2 word [tʂu2] (i.e., the deviant). Results showed a mismatch negativity (MMN) in the Tone 2 condition but not in the Sandhi condition, suggesting different neural processing mechanisms for Tone 2 and Sandhi words. Together, the current data suggest that the underlying tone contributes more to the processing of productive tone sandhi and the surface tone contributes more to the processing of less productive tone sandhi. In general, this dissertation provides evidence for the representation and processing of words that involve phonological alternation, both within the same language and across different languages

    Speling Successful Sucesfuly: Statistical Learning in Spelling

    Get PDF
    Many spelling errors in English are doubling errors, as when people are stumped by the double ‹l› in ‹trellis›. In Study 1, we tabulated statistical patterns with regards to doubling in English. In Study 2, we collected behavioral data to see if people were sensitive to these statistical patterns in doubling and to explore other factors that might influence doubling such as context, individual differences: language background and spelling ability), and task. We gave two nonword spelling tasks to US college students: N=68) and bilingual Singaporean college students from an English-based education system but with diverse language backgrounds: Mandarin: N=54), Malay: N=44), or Tamil: N=42). In the choice task, participants heard a nonword and chose between two spelling options, e.g. dremmib/dremib. In the free task, they wrote down its best spelling. We found a vowel length effect: more doubling after short vowels than long vowels) that was moderated by spelling ability: better spellers were more influenced by vowel length) and language background. Americans had the largest vowel length effect and Tamil Singaporeans had none, as they possibly associated consonant doubling with the lengthening of doubled consonants in Tamil instead of the preceding vowel. The Mandarin group spelled nonwords least accurately, and greater knowledge of pinyin, a phoneme-based writing system, was associated with higher nonword spelling accuracy. These and other findings reflect how linguistic factors and language background moderate the role of statistical learning and context in spelling
    corecore