12,110 research outputs found

    Multimodal Grounding for Language Processing

    Get PDF
    This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

    Ecological IVIS design : using EID to develop a novel in-vehicle information system

    Get PDF
    New in-vehicle information systems (IVIS) are emerging which purport to encourage more environment friendly or ‘green’ driving. Meanwhile, wider concerns about road safety and in-car distractions remain. The ‘Foot-LITE’ project is an effort to balance these issues, aimed at achieving safer and greener driving through real-time driving information, presented via an in-vehicle interface which facilitates the desired behaviours while avoiding negative consequences. One way of achieving this is to use ecological interface design (EID) techniques. This article presents part of the formative human-centred design process for developing the in-car display through a series of rapid prototyping studies comparing EID against conventional interface design principles. We focus primarily on the visual display, although some development of an ecological auditory display is also presented. The results of feedback from potential users as well as subject matter experts are discussed with respect to implications for future interface design in this field

    Multimodal Polynomial Fusion for Detecting Driver Distraction

    Full text link
    Distracted driving is deadly, claiming 3,477 lives in the U.S. in 2015 alone. Although there has been a considerable amount of research on modeling the distracted behavior of drivers under various conditions, accurate automatic detection using multiple modalities and especially the contribution of using the speech modality to improve accuracy has received little attention. This paper introduces a new multimodal dataset for distracted driving behavior and discusses automatic distraction detection using features from three modalities: facial expression, speech and car signals. Detailed multimodal feature analysis shows that adding more modalities monotonically increases the predictive accuracy of the model. Finally, a simple and effective multimodal fusion technique using a polynomial fusion layer shows superior distraction detection results compared to the baseline SVM and neural network models.Comment: INTERSPEECH 201

    Emotion classification in Parkinson's disease by higher-order spectra and power spectrum features using EEG signals: A comparative study

    Get PDF
    Deficits in the ability to process emotions characterize several neuropsychiatric disorders and are traits of Parkinson's disease (PD), and there is need for a method of quantifying emotion, which is currently performed by clinical diagnosis. Electroencephalogram (EEG) signals, being an activity of central nervous system (CNS), can reflect the underlying true emotional state of a person. This study applied machine-learning algorithms to categorize EEG emotional states in PD patients that would classify six basic emotions (happiness and sadness, fear, anger, surprise and disgust) in comparison with healthy controls (HC). Emotional EEG data were recorded from 20 PD patients and 20 healthy age-, education level- and sex-matched controls using multimodal (audio-visual) stimuli. The use of nonlinear features motivated by the higher-order spectra (HOS) has been reported to be a promising approach to classify the emotional states. In this work, we made the comparative study of the performance of k-nearest neighbor (kNN) and support vector machine (SVM) classifiers using the features derived from HOS and from the power spectrum. Analysis of variance (ANOVA) showed that power spectrum and HOS based features were statistically significant among the six emotional states (p < 0.0001). Classification results shows that using the selected HOS based features instead of power spectrum based features provided comparatively better accuracy for all the six classes with an overall accuracy of 70.10% ± 2.83% and 77.29% ± 1.73% for PD patients and HC in beta (13-30 Hz) band using SVM classifier. Besides, PD patients achieved less accuracy in the processing of negative emotions (sadness, fear, anger and disgust) than in processing of positive emotions (happiness, surprise) compared with HC. These results demonstrate the effectiveness of applying machine learning techniques to the classification of emotional states in PD patients in a user independent manner using EEG signals. The accuracy of the system can be improved by investigating the other HOS based features. This study might lead to a practical system for noninvasive assessment of the emotional impairments associated with neurological disorders

    Audio-visual speech perception: a developmental ERP investigation

    Get PDF
    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-visual speech perception in adults, with visual cues reliably modulating auditory ERP responses to speech. Previous work has shown congruence-dependent shortening of auditory N1/P2 latency and congruence-independent attenuation of amplitude in the presence of auditory and visual speech signals, compared to auditory alone. The aim of this study was to chart the development of these well-established modulatory effects over mid-to-late childhood. Experiment 1 employed an adult sample to validate a child-friendly stimulus set and paradigm by replicating previously observed effects of N1/P2 amplitude and latency modulation by visual speech cues; it also revealed greater attenuation of component amplitude given incongruent audio-visual stimuli, pointing to a new interpretation of the amplitude modulation effect. Experiment 2 used the same paradigm to map cross-sectional developmental change in these ERP responses between 6 and 11 years of age. The effect of amplitude modulation by visual cues emerged over development, while the effect of latency modulation was stable over the child sample. These data suggest that auditory ERP modulation by visual speech represents separable underlying cognitive processes, some of which show earlier maturation than others over the course of development
    • …
    corecore