6 research outputs found
Placing Objects in Gesture Space: Toward Real-Time Understanding of Spatial Descriptions
Han T, Kennington C, Schlangen D. Placing Objects in Gesture Space: Toward Real-Time Understanding of Spatial Descriptions. In: Proceedings of the thirty-second AAAI conference on artificial intelligence (AAAI18). New Orleans: The association for the advancement of artificial intelligence; 2018
Integrated Framework Design for Intelligent Human Machine Interaction
Human-computer interaction, sometimes referred to as Man-Machine Interaction, is a concept that emerged simultaneously with computers, or more generally machines. The methods by which humans have been interacting with computers have traveled a long way. New designs and technologies appear every day. However, computer systems and complex machines are often only technically successful, and most of the time users may find them confusing to use; thus, such systems are never used efficiently. Therefore, building sophisticated machines and robots is not the only thing someone has to address; in fact, more effort should be put to make these machines simpler for all kind of users, and generic enough to accommodate different types of environments. Thus, designing intelligent human computer interaction modules come to emerge. In this work, we aim to implement a generic framework (referred to as CIMF framework) that allows the user to control the synchronized and coordinated cooperative type of work that a set of robots can perform. Three robots are involved so far: Two manipulators and one mobile robot. The framework should be generic enough to be hardware independent and to allow the easy integration of new entities and modules. We also aim to implement the different building blocks for the intelligent manufacturing cell that communicates with the framework via the most intelligent and advanced human computer interaction techniques. Three techniques shall be addressed: Interface-, audio-, and visual-based type of interaction
Learning to Interpret and Apply Multimodal Descriptions
Han T. Learning to Interpret and Apply Multimodal Descriptions. Bielefeld: Universität Bielefeld; 2018.Enabling computers to understand natural human communication is a goal researchers have been long aspired to in artificial intelligence. Since the concept demonstration of “Put-That- There” in 1980s, significant achievements have been made in developing multimodal interfaces that can process human communication such as speech, eye gaze, facial emotion, co-verbal hand gestures and pen input. State-of-the-art multimodal interfaces are able to process pointing gestures, symbolic gestures with conventional meanings, as well as gesture commands with pre-defined meanings (e.g., circling for “select”). However, in natural communication, co- verbal gestures/pen input rarely convey meanings via conventions or pre-defined rules, but embody meanings relatable to the accompanying speech.
For example, in route given tasks, people often describe landmarks verbally (e.g., two buildings), while demonstrating the relative position with two hands facing each other in the space. Interestingly, when the same gesture is accompanied by the utterance a ball, it may indicate the size of the ball. Hence, the interpretation of such co-verbal hand gestures largely depends on the accompanied verbal content. Similarly, when describing objects, while verbal utterances are most convenient for describing colour and category (e.g., a brown elephant), hand-drawn sketches are often deployed to convey iconic information such as the exact shape of the elephant’s trunk, which is typically difficult to encode in language.
This dissertation concerns the task of learning to interpret multimodal descriptions com- posed of verbal utterances and hand gestures/sketches, and apply corresponding interpretations to tasks such as image retrieval. Specifically, we aim to address following research questions: 1) For co-verbal gestures that embody meanings relatable to accompanied verbal content, how can we use natural language information to interpret the semantics of such co-verbal gestures, e.g., does a gesture indicate relative position or size? 2) As an integral system of commu- nication, speech and gestures not only bear close semantic relations, but also close temporal relations. To what degree and on which dimensions can hand gestures benefit the task of inter- preting multimodal descriptions? 3) While it’s obvious that iconic information in hand-drawn sketches enriches verbal content in object descriptions, how to model the joint contributions of such multimodal descriptions and to what degree can verbal descriptions compensate reduced iconic details in hand-drawn sketches?
To address the above questions, we first introduce three multimodal description corpora: a spatial description corpus composed of natural language and placing gestures (also referred as abstract deictics), a multimodal object description corpus composed of natural language and hand-drawn sketches, and an existing corpus - the Bielefeld Speech and Gesture Alignment Corpus (SAGA).
3
4
We frame the problem of learning gesture semantics as a multi-label classification task us- ing natural language information and hand gesture features. We conducted an experiment with the SAGA corpus. The results show that natural language is informative for the interpretation of hand gestures.
Further more, we describe a system that models the interpretation and application of spatial descriptions and explored three variants of representation methods of the verbal content. When representing the verbal content in the descriptions with a set of automatically learned symbols, the system’s performance is on par with representations with manually defined symbols (e.g., pre-defined object properties). We show that abstract deictic gestures not only lead to better understanding of spatial descriptions, but also result in earlier correct decisions of the system, which can be used to trigger immediate reactions in dialogue systems.
Finally, we investigate the interplay of semantics between symbolic (natural language) and iconic (sketches) modes in multimodal object descriptions, where natural language and sketches jointly contribute to the communications. We model the meaning of natural language and sketches two existing models and combine the meanings from both modalities with a late fusion approach. The results show that even adding reduced sketches (30% of full sketches) can help in the retrieval task. Moreover, in current setup, natural language descriptions can compensate around 30% of reduced sketches
A comprehensive framework for the rapid prototyping of ubiquitous interaction
In the interaction between humans and computational systems, many advances have
been made in terms of hardware (e.g., smart devices with embedded sensors and
multi-touch surfaces) and software (e.g., algorithms for the detection and tracking of
touches, gestures and full body movements). Now that we have the computational
power and devices to manage interactions between the physical and the digital world,
the question is—what should we do? For the Human-Computer Interaction research
community answering to this question means to materialize Mark Weiser’s vision of
Ubiquitous Computing.
In the desktop computing paradigm, the desktop metaphor is implemented by a graphical
user interface operated via mouse and keyboard. Users are accustomed to employing artificial
control devices whose operation has to be learned and they interact in an environment
that inhibits their faculties. For example the mouse is a device that allows movements
in a two dimensional space, thus limiting the twenty three degrees of freedom of the
human’s hand. The Ubiquitous Computing is an evolution in the history of computation:
it aims at making the interface disappear and integrating the information processing into
everyday objects with computational capabilities. In this way humans would no more
be forced to adapt to machines but, instead, the technology will harmonize with the
surrounding environment. Conversely from the desktop case, ubiquitous systems make
use of heterogeneous Input/Output devices (e.g., motion sensors, cameras and touch
surfaces among others) and interaction techniques such as touchless, multi-touch, and
tangible. By reducing the physical constraints in interaction, ubiquitous technologies
can enable interfaces that endow more expressive power (e.g., free-hand gestures) and,
therefore, such technologies are expected to provide users with better tools to think,
create and communicate.
It appears clear that approaches based on classical user interfaces from the desktop
computing world do not fit with ubiquitous needs, for they were thought for a single user
who is interacting with a single computing systems, seated at his workstation and looking
at a vertical screen. To overcome the inadequacy of the existing paradigm, new models
started to be developed that enable users to employ their skills effortlessly and lower
the cognitive burden of interaction with computational machines. Ubiquitous interfaces
are pervasive and thus invisible to its users, or they become invisible with successive
interactions in which the users feel they are instantly and continuously successful.
All the benefits advocated by ubiquitous interaction, like the invisible interface and a more
natural interaction, come at a price: the design and development of interactive systems
raise new conceptual and practical challenges. Ubiquitous systems communicate with the real world by means of sensors, emitters and actuators. Sensors convert real world
inputs into digital data, while emitters and actuators are mostly used to provide digital or
physical feedback (e.g., a speaker emitting sounds). Employing such variety of hardware
devices in a real application can be difficult because their use requires knowledge of
underneath physics and many hours of programming work. Furthermore, data integration
can be cumbersome, for any device vendor uses different programming interfaces and
communication protocols. All these factors make the rapid prototyping of ubiquitous
systems a challenging task.
Prototyping is a pivoting activity to foster innovation and creativity through the exploration
of a design space. Nevertheless, while there are many prototyping tools and
guidelines for traditional user interfaces, very few solutions have been developed for a
holistic prototyping of ubiquitous systems. The tremendous amount of different input devices,
interaction techniques and physical environments envisioned by researchers produces
a severe challenge from the point of view of general and comprehensive development
tools. All of this makes it difficult to work in a design and development space where
practitioners need to be familiar with different related subjects, involving software and
hardware. Moreover, the technological context is further complicated by the fact that
many of the ubiquitous technologies have recently grown from an embryonic stage and are
still in a process of maturation; thus they lack of stability, reliability and homogeneity. For
these reasons, it is compelling to develop tools support to the programming of ubiquitous
interaction. In this thesis work this particular topic is addressed.
The goal is to develop a general conceptual and software framework that makes use
of hardware abstraction to lighten the prototyping process in the design of ubiquitous
systems. The thesis is that, by abstracting from low-level details, it is possible to provide
unified, coherent and consistent access to interacting devices independently of their
implementation or communication protocols. In this dissertation the existing literature is
revised and is pointed out that there is a need in the art of frameworks that provide such
a comprehensive and integrate support. Moreover, the objectives and the methodology to
fulfill them, together with the major contributions of this work are described. Finally, the
design of the proposed framework, its development in the form of a set of software libraries,
its evaluation with real users and a use case are presented. Through the evaluation and
the use case it has been demonstrated that by encompassing heterogeneous devices into
a unique design it is possible to reduce user efforts to develop interaction in ubiquitous
environments. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------En la interacción entre personas y sistemas de computación se han realizado muchos
adelantos por lo que concierne el hardware (p.ej., dispositivos inteligentes con sensores
integrados y superficies táctiles) y el software (p.ej., algoritmos para el reconocimiento
y rastreo de puntos de contactos, gestos de manos y movimientos corporales). Ahora que
se dispone del poder computacional y de los dispositivos para proporcionar una interacción
entre el mundo fisico y el mundo digital, la pregunta es—que se debería hacer? Contestar
a esta pregunta, para la comunidad de investigación en la Interacción Persona-Ordenador,
significa hacer realidad la visión de Mark Weiser sobre la Computación Ubicua.
En el paradigma de computación de escritorio, la metáfora del escritorio se implementa
a través de la interfaz gráfica de usuario con la que se interactúa a través de teclado y
ratón. En este paradigma, los usuarios se adaptan a utilizar dispositivos artificiales, cuyas
operaciones deben ser aprendidas, y a interactuar en un entorno que inhibe sus capacidades.
Por ejemplo, el ratón es un dispositivo que permite movimientos en dos dimensiones,
por tanto limita los veintitrés grados de libertad de una mano. La Computación Ubicua
se considera como una evolución en la historia de la computación: su objetivo es hacer
que la interfaz desaparezca e integrar el procesamiento de la información en los objetos
cotidianos, provistos de capacidad de computo. De esta forma, el usuario no se vería
forzado a adaptarse a la maquinas sino que la tecnología se integrarían directamente
con el entorno. A diferencia de los sistemas de sobremesa, los sistemas ubicuos utilizan
dispositivos de entrada/salida heterogéneos (p.ej., sensores de movimiento, cameras y
superficies táctiles entre otros) y técnicas de interacción como la interacción sin tocar,
multitáctil o tangible. Reduciendo las limitaciones físicas en la interacción, las tecnologías
ubicuas permiten la creación de interfaces con un mayor poder de expresión (p.ej.,
gestos con las manos) y, por lo tanto, se espera que proporcionen a los usuarios mejores
herramientas para pensar, crear y comunicar.
Parece claro que las soluciones basadas en las interfaces clásicas no satisfacen las necesidades
de la interacción ubicua, porque están pensadas por un único usuario que interactúa
con un único sistema de computación, sentado a su mesa de trabajo y mirando una
pantalla vertical. Para superar las deficiencias del paradigma de escritorio, se empezaron
a desarrollar nuevos modelos de interacción que permitiesen a los usuarios emplear sin
esfuerzo sus capacidades innatas y adquiridas y reducir la carga cognitiva de las interfaces
clásicas. Las interfaces ubicuas son pervasivas y, por lo tanto, invisibles a sus usuarios, o
devienen invisibles a través de interacciones sucesivas en las que los usuarios siempre se
sienten que están teniendo éxito. Todos los beneficios propugnados por la interacción
ubicua, como la interfaz invisible o una interacción mas natural, tienen un coste: el diseño y el desarrollo de sistemas de interacción ubicua introducen nuevos retos conceptuales
y prácticos. Los sistemas ubicuos comunican con el mundo real a través de sensores y
emisores. Los sensores convierten las entradas del mundo real en datos digitales, mientras
que los emisores se utilizan principalmente para proporcionar una retroalimentación digital
o física (p.ej., unos altavoces que emiten un sonido). Emplear una gran variedad de
dispositivos hardware en una aplicación real puede ser difícil, porque su uso requiere
conocimiento de física y muchas horas de programación. Además, la integración de los
datos puede ser complicada, porque cada proveedor de dispositivos utiliza diferentes
interfaces de programación y protocolos de comunicación. Todos estos factores hacen
que el prototipado rápido de sistemas ubicuos sea una tarea que constituye un difícil reto
en la actualidad.
El prototipado es una actividad central para promover la innovación y la creatividad a
través de la exploración de un espacio de diseño. Sin embargo, a pesar de que existan
muchas herramientas y líneas guías para el prototipado de las interfaces de escritorio, a
día de hoy han sido desarrolladas muy pocas soluciones para un prototipado holístico de la
interacción ubicua. La enorme cantidad de dispositivos de entrada, técnicas de interacción
y entornos físicos concebidos por los investigadores supone un gran desafío desde el punto
de vista de un entorno general e integral. Todo esto hace que sea difícil trabajar en un
espacio de diseño y desarrollo en el que los profesionales necesitan tener conocimiento de
diferentes materias relacionadas con temas de software y hardware. Además, el contexto
tecnológico se complica por el hecho que muchas de estas tecnologías ubicuas acaban
de salir de un estadio embrionario y están todavía en un proceso de desarrollo; por lo
tanto faltan de estabilidad, fiabilidad y homogeneidad. Por estos motivos es fundamental
desarrollar herramientas que soporten el proceso de prototipado de la interacción ubicua.
Este trabajo de tesis doctoral se dedica a este problema.
El objetivo es desarrollar una arquitectura conceptual y software que utilice un nivel de
abstracción del hardware para hacer mas fácil el proceso de prototipado de sistemas de
interacción ubicua. La tesis es que, abstrayendo de los detalles de bajo nivel, es posible
proporcionar un acceso unificado, consistente y coherente a los dispositivos de interacción
independientemente de su implementación y de los protocolos de comunicación. En esta
tesis doctoral se revisa la literatura existente y se pone de manifiesto la necesidad de
herramientas y marcos que proporcionen dicho soporte global e integrado. Además, se
describen los objetivos propuestos, la metodología para alcanzarlos y las contribuciones
principales de este trabajo. Finalmente, se presentan el diseño del marco conceptual,
así como su desarrollo en forma de un conjunto de librerías software, su evaluación con
usuarios reales y un caso de uso. A través de la evaluación y del caso de uso se ha
demostrado que considerando dispositivos heterogéneos en un único diseño es posible
reducir los esfuerzos de los usuarios para desarrollar la interacción en entornos ubicuos
The Gesture Interpretation Module
Summary. Humans make often conscious and unconscious gestures, which reflect their mind, thoughts and the way these are formulated. These inherently complex processes can in general not be substituted by a corresponding verbal utterance that has the same semantics (McNeill, 1992). Gesture, which is a kind of body language, contains important information on the intention and the state of the gesture producer. Therefore, it is an important communication channels in human computer interaction. In the following we describe first the state of the art in gesture recognition. The next section describes the gesture interpretation module. After that we present the experiments and results for recognition of user states. We summarize our results in the last section