5,041 research outputs found

    Functional imaging studies of visual-auditory integration in man.

    Get PDF
    This thesis investigates the central nervous system's ability to integrate visual and auditory information from the sensory environment into unified conscious perception. It develops the possibility that the principle of functional specialisation may be applicable in the multisensory domain. The first aim was to establish the neuroanatomical location at which visual and auditory stimuli are integrated in sensory perception. The second was to investigate the neural correlates of visual-auditory synchronicity, which would be expected to play a vital role in establishing which visual and auditory stimuli should be perceptually integrated. Four functional Magnetic Resonance Imaging studies identified brain areas specialised for: the integration of dynamic visual and auditory cues derived from the same everyday environmental events (Experiment 1), discriminating relative synchronicity between dynamic, cyclic, abstract visual and auditory stimuli (Experiment 2 & 3) and the aesthetic evaluation of visually and acoustically perceived art (Experiment 4). Experiment 1 provided evidence to suggest that the posterior temporo-parietal junction may be an important site of crossmodal integration. Experiment 2 revealed for the first time significant activation of the right anterior frontal operculum (aFO) when visual and auditory stimuli cycled asynchronously. Experiment 3 confirmed and developed this observation as the right aFO was activated only during crossmodal (visual-auditory), but not intramodal (visual-visual, auditory-auditory) asynchrony. Experiment 3 also demonstrated activation of the amygdala bilaterally during crossmodal synchrony. Experiment 4 revealed the neural correlates of supramodal, contemplative, aesthetic evaluation within the medial fronto-polar cortex. Activity at this locus varied parametrically according to the degree of subjective aesthetic beauty, for both visual art and musical extracts. The most robust finding of this thesis is that activity in the right aFO increases when concurrently perceived visual and auditory sensory stimuli deviate from crossmodal synchrony, which may veto the crossmodal integration of unrelated stimuli into unified conscious perception

    Neural Mechanisms Underlying Hierarchical Speech-in-Noise Processing

    Get PDF
    One of the most commonly reported complaints related to hearing is difficulty understanding speech-in-noise (SIN). Numerous individuals struggle to effectively communicate in adverse listening conditions, even those with normal hearing. These difficulties are exacerbated due to age and hearing-related deficits such as hearing loss and auditory processing disorders. Despite the high prevalence of SIN deficits in individuals across the lifespan, the neural mechanisms underlying successful speech comprehension in noise are not well understood. Communication in noise is an incredibly complex process that requires efficient processing throughout the entire auditory pathway as well as contributions from higher-order cognitive processes including working memory, inhibition, and attention. In a series of studies using electrophysiologic (EEG) and behavioral measures, this dissertation evaluated the neural correlates of SIN perception across subcortical and cortical levels of the auditory system to identify how top-down and bottom-up influences aid SIN understanding. The first study examined the effects of hearing loss on SIN processing in older adults at the cortical level using frequency-specific neural oscillations (i.e., brain rhythms) and functional connectivity (i.e., directed neural transmission). We found that low-frequency alpha and beta oscillations within and between prefrontal and auditory cortices reflect the ability to flexibly allocate neural resources and recruit top-down predictions to compensate for hearing-related declines and facilitate efficient SIN perception. The second study, in younger adults, investigated the role of attention in SIN processing and how it interacts with early sensory encoding. Hierarchical processing in brainstem and cortex was assessed by simultaneously recording frequency-following responses (FFRs) and event-related potentials (ERPs) at the source level. We found that attention modulates SIN processing at both subcortical and cortical levels and strengthens bidirectional neural signaling within the central auditory pathway. A relative disengagement of corticofugal transmission was observed in noise but only for passive listening suggesting attention aids SIN perception by maintaining top-down reinforcement of acoustic feature encoding within the primary auditory pathways. Taken together, these results indicate that the neural networks engaged during SIN perception depend on a complex interplay between bottom-up and top-down factors including signal clarity, listeners hearing status, and attentional deployment

    Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers

    Get PDF
    We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality

    Distraction Control Processes in Free Recall: Costs and Benefits to Performance

    Get PDF
    How is semantic memory influenced by individual differences under conditions of distraction? This question was addressed by observing how visual target words—drawn from a single category—were recalled whilst ignoring spoken distracter words that were either members of the same, or members of a different (single) category. Distracter words were presented either synchronously or asynchronously with target words. Recall performance was correlated with participants’ working memory capacity (WMC), which was taken to be an index of the capacity for distracter inhibition. Distraction was greater from semantically similar words and distraction was greater when the words were presented synchronously. WMC was related to disruption only with synchronous, not asynchronous, presentation. Subsequent experiments found more distracter inhibition – as measured by subsequent negative priming of distracters – amongst individuals with higher WMC but this may be dependent on targets and distracters being comparable category exemplars: With less dominant category members as distracters, target recall was impaired – relative to control – only amongst individuals with low WMC. The results demonstrate distracter inhibition occurring only in conditions where target-distracter selection is challenging. Inhibition incurs costs to subsequent performance, but there is an immediate price for not inhibiting

    The Effects of Iconic Gestures and Babble Language on Word Intelligibility in Sentence Context

    Get PDF
    Purpose:This study investigated to what extent iconic co-speech gestures helpword intelligibility in sentence context in two different linguistic maskers (nativevs. foreign). It was hypothesized that sentence recognition improves with thepresence of iconic co-speech gestures and with foreign compared to nativebabble.Method:Thirty-two native Dutch participants performed a Dutch word recogni-tion task in context in which they were presented with videos in which anactress uttered short Dutch sentences (e.g.,Ze begint te openen,“She starts toopen”). Participants were presented with a total of six audiovisual conditions: nobackground noise (i.e., clear condition) without gesture, no background noise withgesture, French babble without gesture, French babble with gesture, Dutch bab-ble without gesture, and Dutch babble with gesture; and they were asked to typedown what was said by the Dutch actress. The accurate identification of theaction verbs at the end of the target sentences was measured.Results:The results demonstrated that performance on the task was better inthe gesture compared to the nongesture conditions (i.e., gesture enhancementeffect). In addition, performance was better in French babble than in Dutchbabble.Conclusions:Listeners benefit from iconic co-speech gestures during commu-nication and from foreign background speech compared to native. Theseinsights into multimodal communication may be valuable to everyone whoengages in multimodal communication and especially to a public who oftenworks in public places where competing speech is present in the background

    Utilization of Chatbots in Customer Interface

    Get PDF
    Automation has become a worldwide trend in business. Businesses try to find competitive edge from more efficient processes, lower costs and better customer service. In this Bachelor’s thesis, I focus on one instance of the trend: web-based chatbots in the customer interface. Based on a broad literature review, this thesis illustrates what are the prerequisites for the utilization of chatbots, how should they be implemented and finally, what pros and cons managers can expect from such investments. Managers should first be aware of the technical restrictions and challenges chatbots as a medium exhibit. Then, through analysis on their customers, managers should assess the suitability of chatbots for their business. The design process should include both the customers as well as different departments in the company. This can also help with change resistance in the implementation phase. Finally, the chatbot should be constantly evaluated to ensure the benefits promised are delivered. Although chatbots can offer versatility and cost savings, poorly design may end up costing the firm both in the terms of unnecessary investment and reduced customer satisfaction. Although no new concepts are introduced, this thesis is a good starting point for managers interested in utilizing chatbots. On the other hand, as the topic is currently relevant, this thesis can be useful for other industries as well

    FrameNet annotation for multimodal corpora: devising a methodology for the semantic representation of text-image interactions in audiovisual productions

    Get PDF
    Multimodal analyses have been growing in importance within several approaches to Cognitive Linguistics and applied fields such as Natural Language Understanding. Nonetheless fine-grained semantic representations of multimodal objects are still lacking, especially in terms of integrating areas such as Natural Language Processing and Computer Vision, which are key for the implementation of multimodality in Computational Linguistics. In this dissertation, we propose a methodology for extending FrameNet annotation to the multimodal domain, since FrameNet can provide fine-grained semantic representations, particularly with a database enriched by Qualia and other interframal and intraframal relations, as it is the case of FrameNet Brasil. To make FrameNet Brasil able to conduct multimodal analysis, we outlined the hypothesis that similarly to the way in which words in a sentence evoke frames and organize their elements in the syntactic locality accompanying them, visual elements in video shots may, also, evoke frames and organize their elements on the screen or work complementarily with the frame evocation patterns of the sentences narrated simultaneously to their appearance on screen, providing different profiling and perspective options for meaning construction. The corpus annotated for testing the hypothesis is composed of episodes of a Brazilian TV Travel Series critically acclaimed as an exemplar of good practices in audiovisual composition. The TV genre chosen also configures a novel experimental setting for research on integrated image and text comprehension, since, in this corpus, text is not a direct description of the image sequence but correlates with it indirectly in a myriad of ways. The dissertation also reports on an eye-tracker experiment conducted to validate the approach proposed to a text-oriented annotation. The experiment demonstrated that it is not possible to determine that text impacts gaze directly and was taken as a reinforcement to the approach of valorizing modes combination. Last, we present the Frame2 dataset, the product of the annotation task carried out for the corpus following both the methodology and guidelines proposed. The results achieved demonstrate that, at least for this TV genre but possibly also for others, a fine-grained semantic annotation tackling the diverse correlations that take place in a multimodal setting provides new perspective in multimodal comprehension modeling. Moreover, multimodal annotation also enriches the development of FrameNets, to the extent that correlations found between modalities can attest the modeling choices made by those building frame-based resources.Análises multimodais vêm crescendo em importância em várias abordagens da Linguística Cognitiva e em diversas áreas de aplicação, como o da Compreensão de Linguagem Natural. No entanto, há significativa carência de representações semânticas refinadas de objetos multimodais, especialmente em termos de integração de áreas como Processamento de Linguagem Natural e Visão Computacional, que são fundamentais para a implementação de multimodalidade no campo da Linguística Computacional. Nesta tese, propomos uma metodologia para estender o método de anotação da FrameNet ao domínio multimodal, uma vez que a FrameNet pode fornecer representações semânticas refinadas, particularmente com um banco de dados enriquecido por Qualia e outras relações interframe e intraframe, como é o caso do FrameNet Brasil. Para tornar a FrameNet Brasil capaz de realizar análises multimodais, delineamos a hipótese de que, assim como as palavras em uma frase evocam frames e organizam seus elementos na localidade sintática que os acompanha, os elementos visuais nos planos de vídeo também podem evocar frames e organizar seus elementos na tela ou trabalhar de forma complementar aos padrões de evocação de frames das sentenças narradas simultaneamente ao seu aparecimento na tela, proporcionando diferentes perfis e opções de perspectiva para a construção de sentido. O corpus anotado para testar a hipótese é composto por episódios de um programa televisivo de viagens brasileiro aclamado pela crítica como um exemplo de boas práticas em composição audiovisual. O gênero televisivo escolhido também configura um novo conjunto experimental para a pesquisa em imagem integrada e compreensão textual, uma vez que, neste corpus, o texto não é uma descrição direta da sequência de imagens, mas se correlaciona com ela indiretamente em uma miríade de formas diversa. A Tese também relata um experimento de rastreamento ocular realizado para validar a abordagem proposta para uma anotação orientada por texto. O experimento demonstrou que não é possível determinar que o texto impacta diretamente o direcionamento do olhar e foi tomado como um reforço para a abordagem de valorização da combinação de modos. Por fim, apresentamos o conjunto de dados Frame2, produto da tarefa de anotação realizada para o corpus seguindo a metodologia e as diretrizes propostas. Os resultados obtidos demonstram que, pelo menos para esse gênero de TV, mas possivelmente também para outros, uma anotação semântica refinada que aborde as diversas correlações que ocorrem em um ambiente multimodal oferece uma nova perspectiva na modelagem da compreensão multimodal. Além disso, a anotação multimodal também enriquece o desenvolvimento de FrameNets, na medida em que as correlações encontradas entre as modalidades podem atestar as escolhas de modelagem feitas por aqueles que criam recursos baseados em frames.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio

    MEG, PSYCHOPHYSICAL AND COMPUTATIONAL STUDIES OF LOUDNESS, TIMBRE, AND AUDIOVISUAL INTEGRATION

    Get PDF
    Natural scenes and ecological signals are inherently complex and understanding of their perception and processing is incomplete. For example, a speech signal contains not only information at various frequencies, but is also not static; the signal is concurrently modulated temporally. In addition, an auditory signal may be paired with additional sensory information, as in the case of audiovisual speech. In order to make sense of the signal, a human observer must process the information provided by low-level sensory systems and integrate it across sensory modalities and with cognitive information (e.g., object identification information, phonetic information). The observer must then create functional relationships between the signals encountered to form a coherent percept. The neuronal and cognitive mechanisms underlying this integration can be quantified in several ways: by taking physiological measurements, assessing behavioral output for a given task and modeling signal relationships. While ecological tokens are complex in a way that exceeds our current understanding, progress can be made by utilizing synthetic signals that encompass specific essential features of ecological signals. The experiments presented here cover five aspects of complex signal processing using approximations of ecological signals : (i) auditory integration of complex tones comprised of different frequencies and component power levels; (ii) audiovisual integration approximating that of human speech; (iii) behavioral measurement of signal discrimination; (iv) signal classification via simple computational analyses and (v) neuronal processing of synthesized auditory signals approximating speech tokens. To investigate neuronal processing, magnetoencephalography (MEG) is employed to assess cortical processing non-invasively. Behavioral measures are employed to evaluate observer acuity in signal discrimination and to test the limits of perceptual resolution. Computational methods are used to examine the relationships in perceptual space and physiological processing between synthetic auditory signals, using features of the signals themselves as well as biologically-motivated models of auditory representation. Together, the various methodologies and experimental paradigms advance the understanding of ecological signal analytics concerning the complex interactions in ecological signal structure
    corecore