1,301 research outputs found

    A false colouring real time visual saliency algorithm for reference resolution in simulated 3-D environments

    Get PDF
    In this paper we present a novel false colouring visual saliency algorithm and illustrate how it is used in the Situated Language Interpreter system to resolve natural language references

    Cognitive Principles in Robust Multimodal Interpretation

    Full text link
    Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech and gesture. To build effective multimodal interfaces, automated interpretation of user multimodal inputs is important. Inspired by the previous investigation on cognitive status in multimodal human machine interaction, we have developed a greedy algorithm for interpreting user referring expressions (i.e., multimodal reference resolution). This algorithm incorporates the cognitive principles of Conversational Implicature and Givenness Hierarchy and applies constraints from various sources (e.g., temporal, semantic, and contextual) to resolve references. Our empirical results have shown the advantage of this algorithm in efficiently resolving a variety of user references. Because of its simplicity and generality, this approach has the potential to improve the robustness of multimodal input interpretation

    Context-based multimodal interpretation : an integrated approach to multimodal fusion and discourse processing

    Get PDF
    This thesis is concerned with the context-based interpretation of verbal and nonverbal contributions to interactions in multimodal multiparty dialogue systems. On the basis of a detailed analysis of context-dependent multimodal discourse phenomena, a comprehensive context model is developed. This context model supports the resolution of a variety of referring and elliptical expressions as well as the processing and reactive generation of turn-taking signals and the identification of the intended addressee(s) of a contribution. A major goal of this thesis is the development of a generic component for multimodal fusion and discourse processing. Based on the integration of this component into three distinct multimodal dialogue systems, the generic applicability of the approach is shown.Diese Dissertation befasst sich mit der kontextbasierten Interpretation von verbalen und nonverbalen GesprĂ€chsbeitrĂ€gen im Rahmen von multimodalen Dialogsystemen. Im Rahmen dieser Arbeit wird, basierend auf einer detaillierten Analyse multimodaler DiskursphĂ€nomene, ein umfassendes Modell des GesprĂ€chskontextes erarbeitet. Dieses Modell soll sowohl die Verarbeitung einer Vielzahl von referentiellen und elliptischen AusdrĂŒcken, als auch die Erzeugung reaktiver Aktionen wie sie fĂŒr den Sprecherwechsel benötigt werden unterstĂŒtzen. Ein zentrales Ziel dieser Arbeit ist die Entwicklung einer generischen Komponente zur multimodalen Fusion und Diskursverarbeitung. Anhand der Integration dieser Komponente in drei unterschiedliche Dialogsysteme soll der generische Charakter dieser Komponente gezeigt werden

    MOG 2010:3rd Workshop on Multimodal Output Generation: Proceedings

    Get PDF

    Say That Again: The role of multimodal redundancy in communication and context

    Get PDF
    With several modes of expression, such as facial expressions, body language, and speech working together to convey meaning, social communication is rich in redundancy. While typically relegated to signal preservation, this study investigates the role of cross-modal redundancies in establishing performance context, focusing on unaided, solo performances. Drawing on information theory, I operationalize redundancy as predictability and use an array of machine learning models to featurize speakers\u27 facial expressions, body poses, movement speeds, acoustic features, and spoken language from 24 TEDTalks and 16 episodes of Comedy Central Stand-Up Presents. This analysis demonstrates that it is possible to distinguish between these performance types based on cross-modal predictions, while also highlighting the significant amount of prediction supported by the signals’ synchrony across modalities. Further research is needed to unravel the complexities of redundancy\u27s place in social communication, paving the way for more effective and engaging communication strategies

    Visual Salience and Reference Resolution in Simulated 3-D Environments

    Full text link
    • 

    corecore