80 research outputs found

    Interaction in spoken word recognition models:Feedback helps

    Get PDF
    Human perception, cognition, and action requires fast integration of bottom-up signals with top-down knowledge and context. A key theoretical perspective in cognitive science is the interactive activation hypothesis: forward and backward flow in bidirectionally connected neural networks allows humans and other biological systems to approximate optimal integration of bottom-up and top-down information under real-world constraints. An alternative view is that online feedback is neither necessary nor helpful; purely feed forward alternatives can be constructed for any feedback system, and online feedback could not improve processing and would preclude veridical perception. In the domain of spoken word recognition, the latter view was apparently supported by simulations using the interactive activation model, TRACE, with and without feedback: as many words were recognized more quickly without feedback as were recognized faster with feedback, However, these simulations used only a small set of words and did not address a primary motivation for interaction: making a model robust in noise. We conducted simulations using hundreds of words, and found that the majority were recognized more quickly with feedback than without. More importantly, as we added noise to inputs, accuracy and recognition times were better with feedback than without. We follow these simulations with a critical review of recent arguments that online feedback in interactive activation models like TRACE is distinct from other potentially helpful forms of feedback. We conclude that in addition to providing the benefits demonstrated in our simulations, online feedback provides a plausible means of implementing putatively distinct forms of feedback, supporting the interactive activation hypothesis

    Deep Neural Network Architectures for Large-scale, Robust and Small-Footprint Speaker and Language Recognition

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura : 27-04-2017Artificial neural networks are powerful learners of the information embedded in speech signals. They can provide compact, multi-level, nonlinear representations of temporal sequences and holistic optimization algorithms capable of surpassing former leading paradigms. Artificial neural networks are, therefore, a promising technology that can be used to enhance our ability to recognize speakers and languages–an ability increasingly in demand in the context of new, voice-enabled interfaces used today by millions of users. The aim of this thesis is to advance the state-of-the-art of language and speaker recognition through the formulation, implementation and empirical analysis of novel approaches for large-scale and portable speech interfaces. Its major contributions are: (1) novel, compact network architectures for language and speaker recognition, including a variety of network topologies based on fully-connected, recurrent, convolutional, and locally connected layers; (2) a bottleneck combination strategy for classical and neural network approaches for long speech sequences; (3) the architectural design of the first, public, multilingual, large vocabulary continuous speech recognition system; and (4) a novel, end-to-end optimization algorithm for text-dependent speaker recognition that is applicable to a range of verification tasks. Experimental results have demonstrated that artificial neural networks can substantially reduce the number of model parameters and surpass the performance of previous approaches to language and speaker recognition, particularly in the cases of long short-term memory recurrent networks (used to model the input speech signal), end-to-end optimization algorithms (used to predict languages or speakers), short testing utterances, and large training data collections.Las redes neuronales artificiales son sistemas de aprendizaje capaces de extraer la información embebida en las señales de voz. Son capaces de modelar de forma eficiente secuencias temporales complejas, con información no lineal y distribuida en distintos niveles semanticos, mediante el uso de algoritmos de optimización integral con la capacidad potencial de mejorar los sistemas aprendizaje automático existentes. Las redes neuronales artificiales son, pues, una tecnología prometedora para mejorar el reconocimiento automático de locutores e idiomas; siendo el reconocimiento de de locutores e idiomas, tareas con cada vez más demanda en los nuevos sistemas de control por voz, que ya utilizan millones de personas. Esta tesis tiene como objetivo la mejora del estado del arte de las tecnologías de reconocimiento de locutor y de idioma mediante la formulación, implementación y análisis empírico de nuevos enfoques basados en redes neuronales, aplicables a dispositivos portátiles y a su uso en gran escala. Las principales contribuciones de esta tesis incluyen la propuesta original de: (1) arquitecturas eficientes que hacen uso de capas neuronales densas, localmente densas, recurrentes y convolucionales; (2) una nueva estrategia de combinación de enfoques clásicos y enfoques basados en el uso de las denominadas redes de cuello de botella; (3) el diseño del primer sistema público de reconocimiento de voz, de vocabulario abierto y continuo, que es además multilingüe; y (4) la propuesta de un nuevo algoritmo de optimización integral para tareas de reconocimiento de locutor, aplicable también a otras tareas de verificación. Los resultados experimentales extraídos de esta tesis han demostrado que las redes neuronales artificiales son capaces de reducir el número de parámetros usados por los algoritmos de reconocimiento tradicionales, así como de mejorar el rendimiento de dichos sistemas de forma substancial. Dicha mejora relativa puede acentuarse a través del modelado de voz mediante redes recurrentes de memoria a largo plazo, el uso de algoritmos de optimización integral, el uso de locuciones de evaluation de corta duración y mediante la optimización del sistema con grandes cantidades de datos de entrenamiento

    Vývoj flektivní morfologie druhého jazyka a mezijazykové interferenční efekty

    Get PDF
    Předkládaná disertační práce zkoumá vývoj produkce a percepce flektivní morfologie u studentů angličtiny jako druhého jazyka a roli mateřského jazyka na tento vývoj. Data analyzovaná v této práci pocházejí ze tří psycholingvistických experimentů, které zkoumají produkci a porozumění anglickým neslovům (zkoumají tedy sublexikon, bez aktivace slovního významu). První experiment se zaměřuje na vnímání flektivních morfémů v anglických neslovech u studentů angličtiny na úrovních A0 až C1. Analýza reakčních časů ukázala, že studenti angličtiny se zdají být (podobně jako rodilí mluvčí (např. Post et al., 2008)) citliví na přítomnost morfosyntaktických informací na sublexické úrovni, a pravděpodobně během percepce rozkládají flektivní formy slov na kmeny a přípony a zároveň provádějí implicitní fonetickou analýzu kmene. Podobné reakce na všech úrovních naznačují, že studenti angličtiny mohou být ovlivněni svým mateřským jazykem: čeština je morfologicky mnohem bohatší než angličtina a čeští mluvčí tak mohou být obecně citlivější na morfologickou analýzu slov. Druhý experiment zkoumá produkci flektivních forem, konkrétně forem minulého času, u studentů angličtiny na úrovních A1 až C1 (a u kontrolní skupiny rodilých mluvčí angličtiny) pomocí elicitace. Analýza vyprodukovaných forem slov ukázala, že studenti...The present dissertation investigates the development in the production and perception of inflectional morphology in second-language learners of English and the role of their mother tongue during this development. The data analysed in this thesis stem from three psycholinguistic experiments that examine the production and comprehension of English novel words (thus investigating the sublexicon without the activation of word meaning). The first experiment focuses on the perception of inflectional morphemes in English novel words in L2 students at the A0 to C1 proficiency levels. Reaction-times analysis has shown that L2 learners seem to be (similarly to native speakers (e.g., Post et al., 2008)) sensitive to the presence of morphosyntactic information at the sublexical level, and they appear to decompose inflected forms into stems and affixes during perception and conduct an implicit phonetic analysis of the stem. The presence of these patterns across all levels suggests that L2 performance might be influenced by L1: Czech is morphologically much richer than English, and Czech speakers might thus be in general sensitive to morphological analysis of words. The second experiment investigates the production of inflected forms, more specifically those of past tense, in L2 learners of English at the A1 to...Department of the English Language and ELT MethodologyÚstav anglického jazyka a didaktikyFaculty of ArtsFilozofická fakult

    Oscillatory activity and EEG phase synchrony of concurrent word segmentation and meaning-mapping in 9-year-old children

    Get PDF
    When learning a new language, one must segment words from continuous speech and associate them with meanings. These complex processes can be boosted by attentional mechanisms triggered by multi-sensory information. Previous electrophysiological studies suggest that brain oscillations are sensitive to different hierarchical complexity levels of the input, making them a plausible neural substrate for speech parsing. Here, we investigated the functional role of brain oscillations during concurrent speech segmentation and meaning acquisition in sixty 9-year-old children. We collected EEG data during an audio-visual statistical learning task during which children were exposed to a learning condition with consistent word-picture associations and a random condition with inconsistent word-picture associations before being tested on their ability to recall words and word-picture associations. We capitalized on the brain dynamics to align neural activity to the same rate as an external rhythmic stimulus to explore modulations of neural synchronization and phase synchronization between electrodes during multi-sensory word learning. Results showed enhanced power at both word- and syllabic-rate and increased EEG phase synchronization between frontal and occipital regions in the learning compared to the random condition. These findings suggest that multi-sensory cueing and attentional mechanisms play an essential role in children's successful word learning

    Max-Planck-Institute for Psycholinguistics: Annual Report 2001

    No full text

    Why not model spoken word recognition instead of phoneme monitoring?

    Get PDF
    Norris, McQueen & Cutler present a detailed account of the decision stage of the phoneme monitoring task. However, we question whether this contributes to our understanding of the speech recognition process itself, and we fail to see why phonotactic knowledge is playing a role in phoneme recognition.

    Birth to Three Language Acquisition: Influences of Ambient Language in the Montessori Setting

    Get PDF
    BIRTH TO THREE LANGUAGE ACQUISITION: INFLUENCES OF AMBIENT LANGUAGE IN THE MONTESSORI SETTING Claudine Campanelli Long Island University, New York, 2021 Dissertation Chair: Lynn Cohen, PhD There is an expanse of literature looking at various topics supporting Montessori education, especially in preschool; however, there is a lack of research in infant and toddler Montessori classrooms. Most of the empirical data regarding language acquisition has focused on the child’s acquisition of vocabulary through direct instruction, rather than the learning capability from overhearing a third party in a naturalistic setting. The purpose of this intervention study was to add to the limited empirical research on language acquisition in infant and toddler Montessori environments. More specifically, the intervention assessed if infants and toddlers could indirectly acquire new vocabulary through the Absorbent Mind from teachers and peers’ ambient dialogue during the Montessori three-period lesson. The research utilized a descriptive, correlational pre-and-post quasi-experimental design to assess and analyze vocabulary and ambient language. Data collection occurred in three Association Montessori Internationale (AMI) and American Montessori Society (AMS) infant and toddler mixed-aged environments throughout New York State and Maryland. The Language Environmental Analysis (LENA) system was used to analyze audio recordings. Transcriptions of audio recordings quantified vocabulary acquisition and ambient language. Paired t-tests and ANCOVA were used to analyze children’s acquired vocabulary. A fidelity scale analyzed the extent to which Montessori trained teachers adhered to the three-period lesson intervention. The findings provide opportunities to improve infant and toddler teachers\u27 classroom practice related to language acquisition. Suggestions were offered for early childhood teacher preparation programs

    Spelling English Words: Contributions of Phonological, Morphological and Orthographic Processing Skills of Turkish EFL Students in Grades 6-8

    Get PDF
    The number of studies examining the simultaneous impact of multi-level metalinguistic skills influencing spelling in English is scarce. Spelling necessitates an integrated and simultaneous working of various linguistic, metalinguistic skills, and socio-cultural (SES) factors. The present study investigates the concurrent influence of multi-level metalinguistic skills including phonological, morphological, and orthographic knowledge in English as well as the impact of socio-cultural factors on EFL spelling of Turkish 6th, 7th, and 8th grade pupils (N= 367). Measures tapping phonological, morphological, and orthographic skills in English (L2) and a background questionnaire were administered to Turkish 6th to 8th grade EFL children recruited in multiple school sites in a city of Turkey. A robust configural baseline confirmatory factor analysis (CFA) model for all grades confirmed that the observed variables constructed a three-factor model (phono, morpho, ortho), as it was hypothesized. The second-order structural equation model (SEM) confirmed the three metalinguistic skills work simultaneously and they tap into the linguistic repertoire construct, which predicted EFL word-spelling of Turkish 6th, 7th, and 8th grade pupils. This provides converging results with linguistic repertoire theory, which suggests utilizing multiple metalinguistic skills when spelling words and teaching spelling. The final SEM model with the integrated SES factors (i.e., SES, home-literacy, and additional English exposure) also reported good model fit statistics where the English exposure factor had the highest regression coefficient on EFL word spelling outcomes. The spelling error analyses showed parallel findings to the quantitative analyses, that phonology and orthography, but not morphology, were the two significant predictors of word spelling errors by Turkish 6th to 8th graders. The key findings can inform foreign language teachers about the roles of phonological, morphological, and orthographic processing skills in English spelling. The pedagogical implications of the present study included the importance of directly teaching the three metalinguistic skills when EFL teachers are engaged in spelling instruction. The findings can also inform Turkey’s foreign language education policy decision making by recommending the tailoring of policy and curriculum according to students’ needs
    corecore