6 research outputs found

    Placing Objects in Gesture Space: Toward Real-Time Understanding of Spatial Descriptions

    Get PDF
    Han T, Kennington C, Schlangen D. Placing Objects in Gesture Space: Toward Real-Time Understanding of Spatial Descriptions. In: Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI18). New Orleans: The association for the advancement of artificial intelligence; 2018

    Integrated Framework Design for Intelligent Human Machine Interaction

    Get PDF
    Human-computer interaction, sometimes referred to as Man-Machine Interaction, is a concept that emerged simultaneously with computers, or more generally machines. The methods by which humans have been interacting with computers have traveled a long way. New designs and technologies appear every day. However, computer systems and complex machines are often only technically successful, and most of the time users may find them confusing to use; thus, such systems are never used efficiently. Therefore, building sophisticated machines and robots is not the only thing someone has to address; in fact, more effort should be put to make these machines simpler for all kind of users, and generic enough to accommodate different types of environments. Thus, designing intelligent human computer interaction modules come to emerge. In this work, we aim to implement a generic framework (referred to as CIMF framework) that allows the user to control the synchronized and coordinated cooperative type of work that a set of robots can perform. Three robots are involved so far: Two manipulators and one mobile robot. The framework should be generic enough to be hardware independent and to allow the easy integration of new entities and modules. We also aim to implement the different building blocks for the intelligent manufacturing cell that communicates with the framework via the most intelligent and advanced human computer interaction techniques. Three techniques shall be addressed: Interface-, audio-, and visual-based type of interaction

    Learning to Interpret and Apply Multimodal Descriptions

    Get PDF
    Han T. Learning to Interpret and Apply Multimodal Descriptions. Bielefeld: Universität Bielefeld; 2018.Enabling computers to understand natural human communication is a goal researchers have been long aspired to in artificial intelligence. Since the concept demonstration of “Put-That- There” in 1980s, significant achievements have been made in developing multimodal interfaces that can process human communication such as speech, eye gaze, facial emotion, co-verbal hand gestures and pen input. State-of-the-art multimodal interfaces are able to process pointing gestures, symbolic gestures with conventional meanings, as well as gesture commands with pre-defined meanings (e.g., circling for “select”). However, in natural communication, co- verbal gestures/pen input rarely convey meanings via conventions or pre-defined rules, but embody meanings relatable to the accompanying speech. For example, in route given tasks, people often describe landmarks verbally (e.g., two buildings), while demonstrating the relative position with two hands facing each other in the space. Interestingly, when the same gesture is accompanied by the utterance a ball, it may indicate the size of the ball. Hence, the interpretation of such co-verbal hand gestures largely depends on the accompanied verbal content. Similarly, when describing objects, while verbal utterances are most convenient for describing colour and category (e.g., a brown elephant), hand-drawn sketches are often deployed to convey iconic information such as the exact shape of the elephant’s trunk, which is typically difficult to encode in language. This dissertation concerns the task of learning to interpret multimodal descriptions com- posed of verbal utterances and hand gestures/sketches, and apply corresponding interpretations to tasks such as image retrieval. Specifically, we aim to address following research questions: 1) For co-verbal gestures that embody meanings relatable to accompanied verbal content, how can we use natural language information to interpret the semantics of such co-verbal gestures, e.g., does a gesture indicate relative position or size? 2) As an integral system of commu- nication, speech and gestures not only bear close semantic relations, but also close temporal relations. To what degree and on which dimensions can hand gestures benefit the task of inter- preting multimodal descriptions? 3) While it’s obvious that iconic information in hand-drawn sketches enriches verbal content in object descriptions, how to model the joint contributions of such multimodal descriptions and to what degree can verbal descriptions compensate reduced iconic details in hand-drawn sketches? To address the above questions, we first introduce three multimodal description corpora: a spatial description corpus composed of natural language and placing gestures (also referred as abstract deictics), a multimodal object description corpus composed of natural language and hand-drawn sketches, and an existing corpus - the Bielefeld Speech and Gesture Alignment Corpus (SAGA). 3 4 We frame the problem of learning gesture semantics as a multi-label classification task us- ing natural language information and hand gesture features. We conducted an experiment with the SAGA corpus. The results show that natural language is informative for the interpretation of hand gestures. Further more, we describe a system that models the interpretation and application of spatial descriptions and explored three variants of representation methods of the verbal content. When representing the verbal content in the descriptions with a set of automatically learned symbols, the system’s performance is on par with representations with manually defined symbols (e.g., pre-defined object properties). We show that abstract deictic gestures not only lead to better understanding of spatial descriptions, but also result in earlier correct decisions of the system, which can be used to trigger immediate reactions in dialogue systems. Finally, we investigate the interplay of semantics between symbolic (natural language) and iconic (sketches) modes in multimodal object descriptions, where natural language and sketches jointly contribute to the communications. We model the meaning of natural language and sketches two existing models and combine the meanings from both modalities with a late fusion approach. The results show that even adding reduced sketches (30% of full sketches) can help in the retrieval task. Moreover, in current setup, natural language descriptions can compensate around 30% of reduced sketches

    A comprehensive framework for the rapid prototyping of ubiquitous interaction

    Get PDF
    In the interaction between humans and computational systems, many advances have been made in terms of hardware (e.g., smart devices with embedded sensors and multi-touch surfaces) and software (e.g., algorithms for the detection and tracking of touches, gestures and full body movements). Now that we have the computational power and devices to manage interactions between the physical and the digital world, the question is—what should we do? For the Human-Computer Interaction research community answering to this question means to materialize Mark Weiser’s vision of Ubiquitous Computing. In the desktop computing paradigm, the desktop metaphor is implemented by a graphical user interface operated via mouse and keyboard. Users are accustomed to employing artificial control devices whose operation has to be learned and they interact in an environment that inhibits their faculties. For example the mouse is a device that allows movements in a two dimensional space, thus limiting the twenty three degrees of freedom of the human’s hand. The Ubiquitous Computing is an evolution in the history of computation: it aims at making the interface disappear and integrating the information processing into everyday objects with computational capabilities. In this way humans would no more be forced to adapt to machines but, instead, the technology will harmonize with the surrounding environment. Conversely from the desktop case, ubiquitous systems make use of heterogeneous Input/Output devices (e.g., motion sensors, cameras and touch surfaces among others) and interaction techniques such as touchless, multi-touch, and tangible. By reducing the physical constraints in interaction, ubiquitous technologies can enable interfaces that endow more expressive power (e.g., free-hand gestures) and, therefore, such technologies are expected to provide users with better tools to think, create and communicate. It appears clear that approaches based on classical user interfaces from the desktop computing world do not fit with ubiquitous needs, for they were thought for a single user who is interacting with a single computing systems, seated at his workstation and looking at a vertical screen. To overcome the inadequacy of the existing paradigm, new models started to be developed that enable users to employ their skills effortlessly and lower the cognitive burden of interaction with computational machines. Ubiquitous interfaces are pervasive and thus invisible to its users, or they become invisible with successive interactions in which the users feel they are instantly and continuously successful. All the benefits advocated by ubiquitous interaction, like the invisible interface and a more natural interaction, come at a price: the design and development of interactive systems raise new conceptual and practical challenges. Ubiquitous systems communicate with the real world by means of sensors, emitters and actuators. Sensors convert real world inputs into digital data, while emitters and actuators are mostly used to provide digital or physical feedback (e.g., a speaker emitting sounds). Employing such variety of hardware devices in a real application can be difficult because their use requires knowledge of underneath physics and many hours of programming work. Furthermore, data integration can be cumbersome, for any device vendor uses different programming interfaces and communication protocols. All these factors make the rapid prototyping of ubiquitous systems a challenging task. Prototyping is a pivoting activity to foster innovation and creativity through the exploration of a design space. Nevertheless, while there are many prototyping tools and guidelines for traditional user interfaces, very few solutions have been developed for a holistic prototyping of ubiquitous systems. The tremendous amount of different input devices, interaction techniques and physical environments envisioned by researchers produces a severe challenge from the point of view of general and comprehensive development tools. All of this makes it difficult to work in a design and development space where practitioners need to be familiar with different related subjects, involving software and hardware. Moreover, the technological context is further complicated by the fact that many of the ubiquitous technologies have recently grown from an embryonic stage and are still in a process of maturation; thus they lack of stability, reliability and homogeneity. For these reasons, it is compelling to develop tools support to the programming of ubiquitous interaction. In this thesis work this particular topic is addressed. The goal is to develop a general conceptual and software framework that makes use of hardware abstraction to lighten the prototyping process in the design of ubiquitous systems. The thesis is that, by abstracting from low-level details, it is possible to provide unified, coherent and consistent access to interacting devices independently of their implementation or communication protocols. In this dissertation the existing literature is revised and is pointed out that there is a need in the art of frameworks that provide such a comprehensive and integrate support. Moreover, the objectives and the methodology to fulfill them, together with the major contributions of this work are described. Finally, the design of the proposed framework, its development in the form of a set of software libraries, its evaluation with real users and a use case are presented. Through the evaluation and the use case it has been demonstrated that by encompassing heterogeneous devices into a unique design it is possible to reduce user efforts to develop interaction in ubiquitous environments. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------En la interacción entre personas y sistemas de computación se han realizado muchos adelantos por lo que concierne el hardware (p.ej., dispositivos inteligentes con sensores integrados y superficies táctiles) y el software (p.ej., algoritmos para el reconocimiento y rastreo de puntos de contactos, gestos de manos y movimientos corporales). Ahora que se dispone del poder computacional y de los dispositivos para proporcionar una interacción entre el mundo fisico y el mundo digital, la pregunta es—que se debería hacer? Contestar a esta pregunta, para la comunidad de investigación en la Interacción Persona-Ordenador, significa hacer realidad la visión de Mark Weiser sobre la Computación Ubicua. En el paradigma de computación de escritorio, la metáfora del escritorio se implementa a través de la interfaz gráfica de usuario con la que se interactúa a través de teclado y ratón. En este paradigma, los usuarios se adaptan a utilizar dispositivos artificiales, cuyas operaciones deben ser aprendidas, y a interactuar en un entorno que inhibe sus capacidades. Por ejemplo, el ratón es un dispositivo que permite movimientos en dos dimensiones, por tanto limita los veintitrés grados de libertad de una mano. La Computación Ubicua se considera como una evolución en la historia de la computación: su objetivo es hacer que la interfaz desaparezca e integrar el procesamiento de la información en los objetos cotidianos, provistos de capacidad de computo. De esta forma, el usuario no se vería forzado a adaptarse a la maquinas sino que la tecnología se integrarían directamente con el entorno. A diferencia de los sistemas de sobremesa, los sistemas ubicuos utilizan dispositivos de entrada/salida heterogéneos (p.ej., sensores de movimiento, cameras y superficies táctiles entre otros) y técnicas de interacción como la interacción sin tocar, multitáctil o tangible. Reduciendo las limitaciones físicas en la interacción, las tecnologías ubicuas permiten la creación de interfaces con un mayor poder de expresión (p.ej., gestos con las manos) y, por lo tanto, se espera que proporcionen a los usuarios mejores herramientas para pensar, crear y comunicar. Parece claro que las soluciones basadas en las interfaces clásicas no satisfacen las necesidades de la interacción ubicua, porque están pensadas por un único usuario que interactúa con un único sistema de computación, sentado a su mesa de trabajo y mirando una pantalla vertical. Para superar las deficiencias del paradigma de escritorio, se empezaron a desarrollar nuevos modelos de interacción que permitiesen a los usuarios emplear sin esfuerzo sus capacidades innatas y adquiridas y reducir la carga cognitiva de las interfaces clásicas. Las interfaces ubicuas son pervasivas y, por lo tanto, invisibles a sus usuarios, o devienen invisibles a través de interacciones sucesivas en las que los usuarios siempre se sienten que están teniendo éxito. Todos los beneficios propugnados por la interacción ubicua, como la interfaz invisible o una interacción mas natural, tienen un coste: el diseño y el desarrollo de sistemas de interacción ubicua introducen nuevos retos conceptuales y prácticos. Los sistemas ubicuos comunican con el mundo real a través de sensores y emisores. Los sensores convierten las entradas del mundo real en datos digitales, mientras que los emisores se utilizan principalmente para proporcionar una retroalimentación digital o física (p.ej., unos altavoces que emiten un sonido). Emplear una gran variedad de dispositivos hardware en una aplicación real puede ser difícil, porque su uso requiere conocimiento de física y muchas horas de programación. Además, la integración de los datos puede ser complicada, porque cada proveedor de dispositivos utiliza diferentes interfaces de programación y protocolos de comunicación. Todos estos factores hacen que el prototipado rápido de sistemas ubicuos sea una tarea que constituye un difícil reto en la actualidad. El prototipado es una actividad central para promover la innovación y la creatividad a través de la exploración de un espacio de diseño. Sin embargo, a pesar de que existan muchas herramientas y líneas guías para el prototipado de las interfaces de escritorio, a día de hoy han sido desarrolladas muy pocas soluciones para un prototipado holístico de la interacción ubicua. La enorme cantidad de dispositivos de entrada, técnicas de interacción y entornos físicos concebidos por los investigadores supone un gran desafío desde el punto de vista de un entorno general e integral. Todo esto hace que sea difícil trabajar en un espacio de diseño y desarrollo en el que los profesionales necesitan tener conocimiento de diferentes materias relacionadas con temas de software y hardware. Además, el contexto tecnológico se complica por el hecho que muchas de estas tecnologías ubicuas acaban de salir de un estadio embrionario y están todavía en un proceso de desarrollo; por lo tanto faltan de estabilidad, fiabilidad y homogeneidad. Por estos motivos es fundamental desarrollar herramientas que soporten el proceso de prototipado de la interacción ubicua. Este trabajo de tesis doctoral se dedica a este problema. El objetivo es desarrollar una arquitectura conceptual y software que utilice un nivel de abstracción del hardware para hacer mas fácil el proceso de prototipado de sistemas de interacción ubicua. La tesis es que, abstrayendo de los detalles de bajo nivel, es posible proporcionar un acceso unificado, consistente y coherente a los dispositivos de interacción independientemente de su implementación y de los protocolos de comunicación. En esta tesis doctoral se revisa la literatura existente y se pone de manifiesto la necesidad de herramientas y marcos que proporcionen dicho soporte global e integrado. Además, se describen los objetivos propuestos, la metodología para alcanzarlos y las contribuciones principales de este trabajo. Finalmente, se presentan el diseño del marco conceptual, así como su desarrollo en forma de un conjunto de librerías software, su evaluación con usuarios reales y un caso de uso. A través de la evaluación y del caso de uso se ha demostrado que considerando dispositivos heterogéneos en un único diseño es posible reducir los esfuerzos de los usuarios para desarrollar la interacción en entornos ubicuos

    The gesture interpretation module

    Get PDF

    The Gesture Interpretation Module

    No full text
    Summary. Humans make often conscious and unconscious gestures, which reflect their mind, thoughts and the way these are formulated. These inherently complex processes can in general not be substituted by a corresponding verbal utterance that has the same semantics (McNeill, 1992). Gesture, which is a kind of body language, contains important information on the intention and the state of the gesture producer. Therefore, it is an important communication channels in human computer interaction. In the following we describe first the state of the art in gesture recognition. The next section describes the gesture interpretation module. After that we present the experiments and results for recognition of user states. We summarize our results in the last section
    corecore