3,652 research outputs found

    An acoustic-phonetic approach in automatic Arabic speech recognition

    Get PDF
    In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

    Infants segment words from songs - an EEG study

    No full text
    Children’s songs are omnipresent and highly attractive stimuli in infants’ input. Previous work suggests that infants process linguistic–phonetic information from simplified sung melodies. The present study investigated whether infants learn words from ecologically valid children’s songs. Testing 40 Dutch-learning 10-month-olds in a familiarization-then-test electroencephalography (EEG) paradigm, this study asked whether infants can segment repeated target words embedded in songs during familiarization and subsequently recognize those words in continuous speech in the test phase. To replicate previous speech work and compare segmentation across modalities, infants participated in both song and speech sessions. Results showed a positive event-related potential (ERP) familiarity effect to the final compared to the first target occurrences during both song and speech familiarization. No evidence was found for word recognition in the test phase following either song or speech. Comparisons across the stimuli of the present and a comparable previous study suggested that acoustic prominence and speech rate may have contributed to the polarity of the ERP familiarity effect and its absence in the test phase. Overall, the present study provides evidence that 10-month-old infants can segment words embedded in songs, and it raises questions about the acoustic and other factors that enable or hinder infant word segmentation from songs and speech

    Aptitude, experience and second language pronunciation proficiency development in classroom settings: a longitudinal study

    Get PDF
    The current study longitudinally examined the influence of aptitude on second language (L2) pronunciation development when 40 first-year Japanese university students engaged in practice activities inside and outside English-as-a-Foreign-Language classrooms over one academic year. Spontaneous speech samples were elicited at the beginning, middle and end points of the project, analyzed for global, segmental, syllabic, prosodic and temporal aspects of L2 pronunciation, and linked to their aptitude and experience profiles. Results indicated that the participants generally enhanced the global comprehensibility of their speech (via reducing vowel insertion errors in complex syllables) as a function of increased classroom experience during their first semester, and explicit learning aptitude (associative memory, phonemic coding) appeared to help certain learners further enhance their pronunciation proficiency through the development of fluency and prosody. In the second semester, incidental learning ability (sound sequence recognition) was shown to be a significant predictor of the extent to which certain learners continued to improve and ultimately attain advanced-level L2 comprehensibility, largely thanks to improved segmental accuracy

    Natural infant-directed speech facilitates neural tracking of prosody

    Get PDF
    Infants prefer to be addressed with infant-directed speech (IDS). IDS benefits language acquisition through amplified low-frequency amplitude modulations. It has been reported that this amplification increases electrophysiological tracking of IDS compared to adult-directed speech (ADS). It is still unknown which particular frequency band triggers this effect. Here, we compare tracking at the rates of syllables and prosodic stress, which are both critical to word segmentation and recognition. In mother-infant dyads (n=30), mothers described novel objects to their 9-month-olds while infants’ EEG was recorded. For IDS, mothers were instructed to speak to their children as they typically do, while for ADS, mothers described the objects as if speaking with an adult. Phonetic analyses confirmed that pitch features were more prototypically infant-directed in the IDS-condition compared to the ADS-condition. Neural tracking of speech was assessed by speech-brain coherence, which measures the synchronization between speech envelope and EEG. Results revealed significant speech-brain coherence at both syllabic and prosodic stress rates, indicating that infants track speech in IDS and ADS at both rates. We found significantly higher speech-brain coherence for IDS compared to ADS in the prosodic stress rate but not the syllabic rate. This indicates that the IDS benefit arises primarily from enhanced prosodic stress. Thus, neural tracking is sensitive to parents’ speech adaptations during natural interactions, possibly facilitating higher-level inferential processes such as word segmentation from continuous speech

    The effect of literacy in the speech temporal modulation structure

    Get PDF
    The temporal modulation structure of adult-directed speech is conceptualised as a modulation hierarchy comprising four temporal bands, delta, 1 – 3 Hz, theta, 4 – 8 Hz, beta, 15 – 30 Hz, and low gamma, 30 – 50 Hz. Neuronal oscillatory entrainment to amplitude modulations (AMs) in these four bands may provide a basis for speech encoding and parsing the continuous signal into linguistic units (delta – syllable stress patterns, theta – syllables, beta – onset-rime units, low gamma – phonetic information). While adult-directed speech is theta-dominant and shows tighter theta-beta/low gamma phase alignment, infant-directed speech is delta-dominant and shows tighter delta-theta phase alignment. Although this change in the speech representations could be maturational, it was hypothesized that literacy may also influence the structure of speech. In fact, literacy and schooling are known to change auditory speech entrainment, enhancing phonemic specification and augmenting the phonological detail of the lexicon’s representations. Thus, we hypothesized that a corresponding difference in speech production could also emerge. In this work, spontaneous speech samples were recorded from literate (with lower and higher literacy) and illiterate subjects and their energy modulation spectrum across delta, theta and beta/low gamma AMs as well as the phase synchronization between nested AMs analysed. Measures of the participants’ phonology skills and vocabulary were also retrieved and a specific task to confirm the sensitivity to speech rhythm of the analysis method used (S-AMPH) was conducted. Results showed no differences in the energy of delta, theta and beta/low gamma AMs in spontaneous speech. However, phase alignment between slower and faster speech AMs was significantly enhanced by literacy, showing moderately strong correlations with the phonology measures and literacy. Our data suggest that literacy affects not only cortical entrainment and speech perception but also the physical/rhythmic properties of speech production.A modulação temporal do discurso dirigido a adultos é conceptualizado como uma hierarquia de modulações em quatro bandas temporais: delta, 1 – 3 Hz, theta, 4 – 8 Hz, beta, 15 – 30 Hz, e low gamma, 30 – 50 Hz. A sincronização das oscilações neuronais nestas quatro bandas pode providenciar a base para a codificação e análise de um sinal contínuo em unidades linguísticas (delta – força silábica, theta – sílabas, beta – arranque/rima, low gamma – informação fonética). Enquanto o discurso dirigido a adultos é de um ritmo predominantemente theta e mostra um forte alinhamento entre bandas theta e beta/low gamma, discurso dirigido a crianças é predominantemente de um ritmo delta e mostra maiores sincronizações entre bandas delta e theta. Apesar das diferenças nas representações do discurso poderem resultar de processos maturacionais, foi hipotetizado que a literacia também poderia influenciar as características rítmicas do discurso. De facto, a literacia afecta o processamento auditivo da linguagem, além de desenvolver a consciência fonémica e aumentar o detalhe fonológico das representações lexicais. Neste estudo foram gravadas amostras de discurso espontâneo de sujeitos letrados (alta e baixa escolarização) e iletrados. Os espectros de modulação de energia nas bandas de interesse foram analisados bem como a sincronização das bandas delta-theta e theta-beta/ low gamma. Foram recolhidas medidas de consciência fonológica e vocabulário e foi realizada também uma tarefa para confirmar a sensibilidade do modelo de análise (S-AMPH) ao ritmo do discurso. A análise revelou ausência de diferenças na energia nas modulações delta, theta ou beta/low gamma no discurso espontâneo. Contudo, a sincronização entre as bandas aumentou significativamente com a literacia, revelando uma correlação moderada com as medidas de fonologia, vocabulário e literacia. Sendo assim, a literacia afecta não só a sincronização cortical e à linguagem falada mas também as propriedades físicas e rítmicas da produção do discurso

    The effects of perception- vs. production-based pronunciation instruction

    Get PDF
    While research has shown that provision of explicit pronunciation instruction (PI) is facilitative of various aspects of second language (L2) speech learning (Thomson & Derwing, 2015), a growing number of scholars have begun to examine which type of instruction can best impact on acquisition. In the current study, we explored the effects of perception- vs. production-based methods of PI among tertiary-level Japanese students of English. Participants (N = 115) received two weeks of instruction on either segmental or suprasegmental features of English, using either a perception- or a production-based method, with progress assessed in a pre/post/delayed posttest study design. Although all four treatment groups demonstrated major gains in pronunciation accuracy, performance varied considerably across groups and over time. A close examination of our findings suggested that perception-based training may be the more effective training method across both segmental and suprasegmental features

    Onset-to-onset probability and gradient acceptability in Korean

    Get PDF

    Technology-based reading intervention programs for elementary grades: An analytical review

    Get PDF
    In modern societies, the role of reading is becoming increasingly crucial. Hence, any impairment to the reading ability can seriously limit a person's aspirations. The enormous importance of reading as an essential skill in modern life has encouraged many researchers to try and find more effective intervention approaches. Technology has been used extensively to assist and enhance literacy learning. This analytical review aims at presenting a comprehensive overview of the existing research on technology-based or technology-assisted reading interventions for elementary grades, between 2000 and 2017, along with analyzing various aspects of these studies. After extensive research, 42 articles have met the inclusion criteria, which have evaluated a total of 32 reading programs. The studies are classified into six categories of phonological awareness, phonics, vocabulary, comprehension, fluency, and multi-component. Each reading category begins with a brief introduction. Then, the content and instructional mechanisms of each program in the category is explained, alongside the outcome of its interventions. It is found that vocabulary interventions, as well as using mobile, tablet and other non-computer technologies are massively overlooked. Furthermore, a very limited number of programs focused on fluency, none of them addressed all its components. In addition, despite the required long-term practice for fostering fluency, the reviewed studies have an average intervention time shorter than other intervention categories. This paper provides researchers and solution developers with an extensive and informative review of the current state of the art in reading interventions. Additionally, it identifies the current knowledge gaps and defines future research directions to develop effective reading programs

    Two uses for syllables in a speech recognition system

    Get PDF

    Word learning in the first year of life

    Get PDF
    In the first part of this thesis, we ask whether 4-month-old infants can represent objects and movements after a short exposure in such a way that they recognize either a repeated object or a repeated movement when they are presented simultaneously with a new object or a new movement. If they do, we ask whether the way they observe the visual input is modified when auditory input is presented. We investigate whether infants react to the familiarization labels and to novel labels in the same manner. If the labels as well as the referents are matched for saliency, any difference should be due to processes that are not limited to sensorial perception. We hypothesize that infants will, if they map words to the objects or movements, change their looking behavior whenever they hear a familiar label, a novel label, or no label at all. In the second part of this thesis, we assess the problem of word learning from a different perspective. If infants reason about possible label-referent pairs and are able to make inferences about novel pairs, are the same processes involved in all intermodal learning? We compared the task of learning to associate auditory regularities to visual stimuli (reinforcers), and the word-learning task. We hypothesized that even if infants succeed in learning more than one label during one single event, learning the intermodal connection between auditory and visual regularities might present a more demanding task for them. The third part of this thesis addresses the role of associative learning in word learning. In the last decades, it was repeatedly suggested that co-occurrence probabilities can play an important role in word segmentation. However, the vast majority of studies test infants with artificial streams that do not resemble a natural input: most studies use words of equal length and with unambiguous syllable sequences within word, where the only point of variability is at the word boundaries (Aslin et al., 1998; Saffran, Johnson, Aslin, & Newport, 1999; Saffran et al., 1996; Thiessen et al., 2005; Thiessen & Saffran, 2003). Even if the input is modified to resemble the natural input more faithfully, the words with which infants are tested are always unambiguous \u2013 within words, each syllable predicts its adjacent syllable with the probability of 1.0 (Pelucchi, Hay, & Saffran, 2009; Thiessen et al., 2005). We therefore tested 6-month-old infants with such statistically ambiguous words. Before doing that, we also verified on a large sample of languages whether statistical information in the natural input, where the majority of the words are statistically ambiguous, is indeed useful for segmenting words. Our motivation was partly due to the fact that studies that modeled the segmentation process with a natural language input often yielded ambivalent results about the usefulness of such computation (Batchelder, 2002; Gambell & Yang, 2006; Swingley, 2005). We conclude this introduction with a small remark about the term word. It will be used throughout this thesis without questioning its descriptive value: the common-sense meaning of the term word is unambiguous enough, since all people know what are we referring to when we say or think of the term word. However, the term word is not unambiguous at all (Di Sciullo & Williams, 1987). To mention only some of the classical examples: (1) Do jump and jumped, or go and went, count as one word or as two? This example might seem all too trivial, especially in languages with weak overt morphology as English, but in some languages, each basic form of the word has tens of inflected variables. (2) A similar question arises with all the words that are morphological derivations of other words, such as evict and eviction, examine and reexamine, unhappy and happily, and so on. (3) And finally, each language contains many phrases and idioms: Does air conditioner and give up count as one word, or two? Statistical word segmentation studies in general neglect the issue of the definition of words, assuming that phrases and idioms have strong internal statistics and will therefore be selected as one word (Cutler, 2012). But because compounds or phrases are usually composed of smaller meaningful chunks, it is unclear how would infants extracts these smaller units of speech if they were using predominantly statistical information. We will address the problem of over-segmentations shortly in the third part of the thesis
    • …
    corecore