58,878 research outputs found

    Towards responsive Sensitive Artificial Listeners

    Get PDF
    This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness

    Integration of multimodal data based on surface registration

    Get PDF
    The paper proposes and evaluates a strategy for the alignment of anatomical and functional data of the brain. The method takes as an input two different sets of images of a same patient: MR data and SPECT. It proceeds in four steps: first, it constructs two voxel models from the two image sets; next, it extracts from the two voxel models the surfaces of regions of interest; in the third step, the surfaces are interactively aligned by corresponding pairs; finally a unique volume model is constructed by selectively applying the geometrical transformations associated to the regions and weighting their contributions. The main advantages of this strategy are (i) that it can be applied retrospectively, (ii) that it is tri-dimensional, and (iii) that it is local. Its main disadvantage with regard to previously published methods it that it requires the extraction of surfaces. However, this step is often required for other stages of the multimodal analysis such as the visualization and therefore its cost can be accounted in the global cost of the process.Postprint (published version

    Learning Multi-Modal Word Representation Grounded in Visual Context

    Full text link
    Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify text-based techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results
    • …
    corecore