7,554 research outputs found

    Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

    Get PDF
    Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.Comment: EMNLP 201

    Windows into Sensory Integration and Rates in Language Processing: Insights from Signed and Spoken Languages

    Get PDF
    This dissertation explores the hypothesis that language processing proceeds in "windows" that correspond to representational units, where sensory signals are integrated according to time-scales that correspond to the rate of the input. To investigate universal mechanisms, a comparison of signed and spoken languages is necessary. Underlying the seemingly effortless process of language comprehension is the perceiver's knowledge about the rate at which linguistic form and meaning unfold in time and the ability to adapt to variations in the input. The vast body of work in this area has focused on speech perception, where the goal is to determine how linguistic information is recovered from acoustic signals. Testing some of these theories in the visual processing of American Sign Language (ASL) provides a unique opportunity to better understand how sign languages are processed and which aspects of speech perception models are in fact about language perception across modalities. The first part of the dissertation presents three psychophysical experiments investigating temporal integration windows in sign language perception by testing the intelligibility of locally time-reversed sentences. The findings demonstrate the contribution of modality for the time-scales of these windows, where signing is successively integrated over longer durations (~ 250-300 ms) than in speech (~ 50-60 ms), while also pointing to modality-independent mechanisms, where integration occurs in durations that correspond to the size of linguistic units. The second part of the dissertation focuses on production rates in sentences taken from natural conversations of English, Korean, and ASL. Data from word, sign, morpheme, and syllable rates suggest that while the rate of words and signs can vary from language to language, the relationship between the rate of syllables and morphemes is relatively consistent among these typologically diverse languages. The results from rates in ASL also complement the findings in perception experiments by confirming that time-scales at which phonological units fluctuate in production match the temporal integration windows in perception. These results are consistent with the hypothesis that there are modality-independent time pressures for language processing, and discussions provide a synthesis of converging findings from other domains of research and propose ideas for future investigations

    Comparing the E-Z Reader Model to Other Models of Eye Movement Control in Reading

    Get PDF
    The E-Z Reader model provides a theoretical framework for understanding how word identification, visual processing, attention, and oculomotor control jointly determine when and where the eyes move during reading. Thus, in contrast to other reading models reviewed in this article, E-Z Reader can simultaneously account for many of the known effects of linguistic, visual, and oculomotor factors on eye movement control during reading. Furthermore, the core principles of the model have been generalized to other task domains (e.g., equation solving, visual search), and are broadly consistent with what is known about the architecture of the neural systems that support reading

    Individual differences in lexical access among cochlear implant users

    Get PDF
    Purpose: The current study investigates how individual differences in cochlear implant (CI) users’ sensitivity to word–nonword differences, reflecting lexical uncertainty, relate to their reliance on sentential context for lexical access in processing continuous speech. Method: Fifteen CI users and 14 normal-hearing (NH) controls participated in an auditory lexical decision task (Experiment 1) and a visual-world paradigm task (Experiment 2). Experiment 1 tested participants’ reliance on lexical statistics, and Experiment 2 studied how sentential context affects the time course and patterns of lexical competition leading to lexical access. Results: In Experiment 1, CI users had lower accuracy scores and longer reaction times than NH listeners, particularly for nonwords. In Experiment 2, CI users’ lexical competition patterns were, on average, similar to those of NH listeners, but the patterns of individual CI users varied greatly. Individual CI users’ word–nonword sensitivity (Experiment 1) explained differences in the reliance on sentential context to resolve lexical competition, whereas clinical speech perception scores explained competition with phonologically related words. Conclusions: The general analysis of CI users’ lexical competition patterns showed merely quantitative differences with NH listeners in the time course of lexical competition, but our additional analysis revealed more qualitative differences in CI users’ strategies to process speech. Individuals’ word–nonword sensitivity explained different parts of individual variability than clinical speech perception scores. These results stress, particularly for heterogeneous clinical populations such as CI users, the importance of investigating individual differences in addition to group averages, as they can be informative for clinical rehabilitation

    Speech perception under adverse conditions: Insights from behavioral, computational, and neuroscience research

    Get PDF
    Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. Specifically, we highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. To this end, we review behavioral studies, computational accounts, and neuroimaging findings related to adaptive plasticity in speech perception. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Furthermore, we consider research topics in neuroscience that offer insight into how perception can be adaptively tuned to short-term deviations while balancing the need to maintain stability in the perception of learned long-term regularities. Consideration of the application and limitations of these algorithms in characterizing flexible speech perception under adverse conditions promises to inform theoretical models of speech. © 2014 Guediche, Blumstein, Fiez and Holt

    Comprehension in-situ: how multimodal information shapes language processing

    Get PDF
    The human brain supports communication in dynamic face-to-face environments where spoken words are embedded in linguistic discourse and accompanied by multimodal cues, such as prosody, gestures and mouth movements. However, we only have limited knowledge of how these multimodal cues jointly modulate language comprehension. In a series of behavioural and EEG studies, we investigated the joint impact of these cues when processing naturalistic-style materials. First, we built a mouth informativeness corpus of English words, to quantify mouth informativeness of a large number of words used in the following experiments. Then, across two EEG studies, we found and replicated that native English speakers use multimodal cues and that their interactions dynamically modulate N400 amplitude elicited by words that are less predictable in the discourse context (indexed by surprisal values per word). We then extended the findings to second language comprehenders, finding that multimodal cues modulate L2 comprehension, just like in L1, but to a lesser extent; although L2 comprehenders benefit more from meaningful gestures and mouth movements. Finally, in two behavioural experiments investigating whether multimodal cues jointly modulate the learning of new concepts, we found some evidence that presence of iconic gestures improves memory, and that the effect may be larger if information is presented also with prosodic accentuation. Overall, these findings suggest that real-world comprehension uses all cues present and weights cues differently in a dynamic manner. Therefore, multimodal cues should not be neglected for language studies. Investigating communication in naturalistic contexts containing more than one cue can provide new insight into our understanding of language comprehension in the real world

    Individual Differences in the Perceptual Learning of Degraded Speech: Implications for Cochlear Implant Aural Rehabilitation

    Get PDF
    abstract: In the noise and commotion of daily life, people achieve effective communication partly because spoken messages are replete with redundant information. Listeners exploit available contextual, linguistic, phonemic, and prosodic cues to decipher degraded speech. When other cues are absent or ambiguous, phonemic and prosodic cues are particularly important because they help identify word boundaries, a process known as lexical segmentation. Individuals vary in the degree to which they rely on phonemic or prosodic cues for lexical segmentation in degraded conditions. Deafened individuals who use a cochlear implant have diminished access to fine frequency information in the speech signal, and show resulting difficulty perceiving phonemic and prosodic cues. Auditory training on phonemic elements improves word recognition for some listeners. Little is known, however, about the potential benefits of prosodic training, or the degree to which individual differences in cue use affect outcomes. The present study used simulated cochlear implant stimulation to examine the effects of phonemic and prosodic training on lexical segmentation. Participants completed targeted training with either phonemic or prosodic cues, and received passive exposure to the non-targeted cue. Results show that acuity to the targeted cue improved after training. In addition, both targeted attention and passive exposure to prosodic features led to increased use of these cues for lexical segmentation. Individual differences in degree and source of benefit point to the importance of personalizing clinical intervention to increase flexible use of a range of perceptual strategies for understanding speech.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201

    Lexical Influences on Spoken Spondaic Word Recognition in Hearing-Impaired Patients.

    Get PDF
    Top-down contextual influences play a major part in speech understanding, especially in hearing-impaired patients with deteriorated auditory input. Those influences are most obvious in difficult listening situations, such as listening to sentences in noise but can also be observed at the word level under more favorable conditions, as in one of the most commonly used tasks in audiology, i.e., repeating isolated words in silence. This study aimed to explore the role of top-down contextual influences and their dependence on lexical factors and patient-specific factors using standard clinical linguistic material. Spondaic word perception was tested in 160 hearing-impaired patients aged 23-88 years with a four-frequency average pure-tone threshold ranging from 21 to 88 dB HL. Sixty spondaic words were randomly presented at a level adjusted to correspond to a speech perception score ranging between 40 and 70% of the performance intensity function obtained using monosyllabic words. Phoneme and whole-word recognition scores were used to calculate two context-influence indices (the j factor and the ratio of word scores to phonemic scores) and were correlated with linguistic factors, such as the phonological neighborhood density and several indices of word occurrence frequencies. Contextual influence was greater for spondaic words than in similar studies using monosyllabic words, with an overall j factor of 2.07 (SD = 0.5). For both indices, context use decreased with increasing hearing loss once the average hearing loss exceeded 55 dB HL. In right-handed patients, significantly greater context influence was observed for words presented in the right ears than for words presented in the left, especially in patients with many years of education. The correlations between raw word scores (and context influence indices) and word occurrence frequencies showed a significant age-dependent effect, with a stronger correlation between perception scores and word occurrence frequencies when the occurrence frequencies were based on the years corresponding to the patients' youth, showing a "historic" word frequency effect. This effect was still observed for patients with few years of formal education, but recent occurrence frequencies based on current word exposure had a stronger influence for those patients, especially for younger ones

    An integrated theory of language production and comprehension

    Get PDF
    Currently, production and comprehension are regarded as quite distinct in accounts of language processing. In rejecting this dichotomy, we instead assert that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other. We start by noting that production and comprehension are forms of action and action perception. We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. Specifically, we assume that actors construct forward models of their actions before they execute those actions, and that perceivers of others' actions covertly imitate those actions, then construct forward models of those actions. We use these accounts of action, action perception, and joint action to develop accounts of production, comprehension, and interactive language. Importantly, they incorporate well-defined levels of linguistic representation (such as semantics, syntax, and phonology). We show (a) how speakers and comprehenders use covert imitation and forward modeling to make predictions at these levels of representation, (b) how they interweave production and comprehension processes, and (c) how they use these predictions to monitor the upcoming utterances. We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal
    corecore