771 research outputs found

    Conducting a virtual ensemble with a kinect device

    Get PDF
    This paper presents a gesture-based interaction technique for the implementation of an orchestra conductor and a virtual ensemble, using a 3D camera-based sensor to capture user’s gestures. In particular, a human-computer interface has been developed to recognize conducting gestures using a Microsoft Kinect device. The system allows the conductor to control both the tempo in the piece played as well as the dynamics of each instrument set independently. In order to modify the tempo in the playback, a time-frequency processing-based algorithmis used. Finally, an experiment was conducted to assess user’s opinion of the system as well as experimentally confirm if the features in the system were effectively improving user experience or not.This work has been funded by the Ministerio de Economia y Competitividad of the Spanish Government under Project No. TIN2010-21089-C03-02 and Project No. IPT-2011-0885-430000 and by the Junta de Andalucia under Project No. P11-TIC-7154. The work has been done at Universidad de Malaga. Campus de Excelencia Internacional Andalucia Tech

    ZATLAB : recognizing gestures for artistic performance interaction

    Get PDF
    Most artistic performances rely on human gestures, ultimately resulting in an elaborate interaction between the performer and the audience. Humans, even without any kind of formal analysis background in music, dance or gesture are typically able to extract, almost unconsciously, a great amount of relevant information from a gesture. In fact, a gesture contains so much information, why not use it to further enhance a performance? Gestures and expressive communication are intrinsically connected, and being intimately attached to our own daily existence, both have a central position in our (nowadays) technological society. However, the use of technology to understand gestures is still somehow vaguely explored, it has moved beyond its first steps but the way towards systems fully capable of analyzing gestures is still long and difficult (Volpe, 2005). Probably because, if on one hand, the recognition of gestures is somehow a trivial task for humans, on the other hand, the endeavor of translating gestures to the virtual world, with a digital encoding is a difficult and illdefined task. It is necessary to somehow bridge this gap, stimulating a constructive interaction between gestures and technology, culture and science, performance and communication. Opening thus, new and unexplored frontiers in the design of a novel generation of multimodal interactive systems. This work proposes an interactive, real time, gesture recognition framework called the Zatlab System (ZtS). This framework is flexible and extensible. Thus, it is in permanent evolution, keeping up with the different technologies and algorithms that emerge at a fast pace nowadays. The basis of the proposed approach is to partition a temporal stream of captured movement into perceptually motivated descriptive features and transmit them for further processing in Machine Learning algorithms. The framework described will take the view that perception primarily depends on the previous knowledge or learning. Just like humans do, the framework will have to learn gestures and their main features so that later it can identify them. It is however planned to be flexible enough to allow learning gestures on the fly. This dissertation also presents a qualitative and quantitative experimental validation of the framework. The qualitative analysis provides the results concerning the users acceptability of the framework. The quantitative validation provides the results about the gesture recognizing algorithms. The use of Machine Learning algorithms in these tasks allows the achievement of final results that compare or outperform typical and state-of-the-art systems. In addition, there are also presented two artistic implementations of the framework, thus assessing its usability amongst the artistic performance domain. Although a specific implementation of the proposed framework is presented in this dissertation and made available as open source software, the proposed approach is flexible enough to be used in other case scenarios, paving the way to applications that can benefit not only the performative arts domain, but also, probably in the near future, helping other types of communication, such as the gestural sign language for the hearing impaired.Grande parte das apresentações artísticas são baseadas em gestos humanos, ultimamente resultando numa intricada interação entre o performer e o público. Os seres humanos, mesmo sem qualquer tipo de formação em música, dança ou gesto são capazes de extrair, quase inconscientemente, uma grande quantidade de informações relevantes a partir de um gesto. Na verdade, um gesto contém imensa informação, porque não usá-la para enriquecer ainda mais uma performance? Os gestos e a comunicação expressiva estão intrinsecamente ligados e estando ambos intimamente ligados à nossa própria existência quotidiana, têm uma posicão central nesta sociedade tecnológica actual. No entanto, o uso da tecnologia para entender o gesto está ainda, de alguma forma, vagamente explorado. Existem já alguns desenvolvimentos, mas o objetivo de sistemas totalmente capazes de analisar os gestos ainda está longe (Volpe, 2005). Provavelmente porque, se por um lado, o reconhecimento de gestos é de certo modo uma tarefa trivial para os seres humanos, por outro lado, o esforço de traduzir os gestos para o mundo virtual, com uma codificação digital é uma tarefa difícil e ainda mal definida. É necessário preencher esta lacuna de alguma forma, estimulando uma interação construtiva entre gestos e tecnologia, cultura e ciência, desempenho e comunicação. Abrindo assim, novas e inexploradas fronteiras na concepção de uma nova geração de sistemas interativos multimodais . Este trabalho propõe uma framework interativa de reconhecimento de gestos, em tempo real, chamada Sistema Zatlab (ZtS). Esta framework é flexível e extensível. Assim, está em permanente evolução, mantendo-se a par das diferentes tecnologias e algoritmos que surgem num ritmo acelerado hoje em dia. A abordagem proposta baseia-se em dividir a sequência temporal do movimento humano nas suas características descritivas e transmiti-las para posterior processamento, em algoritmos de Machine Learning. A framework descrita baseia-se no facto de que a percepção depende, principalmente, do conhecimento ou aprendizagem prévia. Assim, tal como os humanos, a framework terá que aprender os gestos e as suas principais características para que depois possa identificá-los. No entanto, esta está prevista para ser flexível o suficiente de forma a permitir a aprendizagem de gestos de forma dinâmica. Esta dissertação apresenta também uma validação experimental qualitativa e quantitativa da framework. A análise qualitativa fornece os resultados referentes à aceitabilidade da framework. A validação quantitativa fornece os resultados sobre os algoritmos de reconhecimento de gestos. O uso de algoritmos de Machine Learning no reconhecimento de gestos, permite a obtençãoc¸ ˜ao de resultados finais que s˜ao comparaveis ou superam outras implementac¸ ˜oes do mesmo g´enero. Al ´em disso, s˜ao tamb´em apresentadas duas implementac¸ ˜oes art´ısticas da framework, avaliando assim a sua usabilidade no dom´ınio da performance art´ıstica. Apesar duma implementac¸ ˜ao espec´ıfica da framework ser apresentada nesta dissertac¸ ˜ao e disponibilizada como software open-source, a abordagem proposta ´e suficientemente flex´ıvel para que esta seja usada noutros cen´ arios. Abrindo assim, o caminho para aplicac¸ ˜oes que poder˜ao beneficiar n˜ao s´o o dom´ınio das artes performativas, mas tamb´em, provavelmente num futuro pr ´oximo, outros tipos de comunicac¸ ˜ao, como por exemplo, a linguagem gestual usada em casos de deficiˆencia auditiva

    Aesthetic potential of human-computer interaction in performing arts

    Get PDF
    Human-computer interaction (HCI) is a multidisciplinary area that studies the communication between users and computers. In this thesis, we want to examine if and how HCI when incorporated into staged performances can generate new possibilities for artistic expression on stage. We define and study four areas of technology-enhanced performance that were strongly influenced by HCI techniques: multimedia expression, body representation, body augmentation and interactive environments. We trace relevant artistic practices that contributed to the exploration of these topics and then present new forms of creative expression that emerged after the incorporation of HCI techniques. We present and discuss novel practices like: performer and the media as one responsive entity, real-time control of virtual characters, on-body projections, body augmentation through humanmachine systems and interactive stage design. The thesis concludes by showing some concrete examples of these novel practices implemented in performance pieces. We present and discuss technologyaugmented dance pieces developed during this master’s degree. We also present a software tool for aesthetic visualisation of movement data and discuss its application in video creation, staged performances and interactive installations

    Altering speech synthesis prosody through real time natural gestural control

    Get PDF
    A significant amount of research has been and continues to be undertaken into generating expressive prosody within speech synthesis. Separately, recent developments in HMM-based synthesis (specifically pHTS, developed at University of Mons) provide a platform for reactive speech synthesis, able to react in real time to surroundings or user interaction. Considering both of these elements, this project explores whether it is possible to generate superior prosody in a speech synthesis system, using natural gestural controls, in real time. Building on a previous piece of work undertaken at The University of Edinburgh, a system is constructed in which a user may apply a variety of prosodic effects in real time through natural gestures, recognised by a Microsoft Kinect sensor. Gestures are recognised and prosodic adjustments made through a series of hand-crafted rules (based on data gathered from preliminary experiments), though machine learning techniques are also considered within this project and recommended for future iterations of the work. Two sets of formal experiments are implemented, both of which suggest that - under further development - the system developed may work successfully in a real world environment. Firstly, user tests show that subjects can learn to control the device successfully, adding prosodic effects to the intended words in the majority of cases with practice. Results are likely to improve further as buffering issues are resolved. Secondly, listening tests show that the prosodic effects currently implemented significantly increase perceived naturalness, and in some cases are able to alter the semantic perception of a sentence in an intended way. Alongside this paper, a demonstration video of the project may be found on the accompanying CD, or online at http://tinyurl.com/msc-synthesis. The reader is advised to view this demonstration, as a way of understanding how the system functions and sounds in action

    Towards a framework for socially interactive robots

    Get PDF
    250 p.En las últimas décadas, la investigación en el campo de la robótica social ha crecido considerablemente. El desarrollo de diferentes tipos de robots y sus roles dentro de la sociedad se están expandiendo poco a poco. Los robots dotados de habilidades sociales pretenden ser utilizados para diferentes aplicaciones; por ejemplo, como profesores interactivos y asistentes educativos, para apoyar el manejo de la diabetes en niños, para ayudar a personas mayores con necesidades especiales, como actores interactivos en el teatro o incluso como asistentes en hoteles y centros comerciales.El equipo de investigación RSAIT ha estado trabajando en varias áreas de la robótica, en particular,en arquitecturas de control, exploración y navegación de robots, aprendizaje automático y visión por computador. El trabajo presentado en este trabajo de investigación tiene como objetivo añadir una nueva capa al desarrollo anterior, la capa de interacción humano-robot que se centra en las capacidades sociales que un robot debe mostrar al interactuar con personas, como expresar y percibir emociones, mostrar un alto nivel de diálogo, aprender modelos de otros agentes, establecer y mantener relaciones sociales, usar medios naturales de comunicación (mirada, gestos, etc.),mostrar personalidad y carácter distintivos y aprender competencias sociales.En esta tesis doctoral, tratamos de aportar nuestro grano de arena a las preguntas básicas que surgen cuando pensamos en robots sociales: (1) ¿Cómo nos comunicamos (u operamos) los humanos con los robots sociales?; y (2) ¿Cómo actúan los robots sociales con nosotros? En esa línea, el trabajo se ha desarrollado en dos fases: en la primera, nos hemos centrado en explorar desde un punto de vista práctico varias formas que los humanos utilizan para comunicarse con los robots de una maneranatural. En la segunda además, hemos investigado cómo los robots sociales deben actuar con el usuario.Con respecto a la primera fase, hemos desarrollado tres interfaces de usuario naturales que pretenden hacer que la interacción con los robots sociales sea más natural. Para probar tales interfaces se han desarrollado dos aplicaciones de diferente uso: robots guía y un sistema de controlde robot humanoides con fines de entretenimiento. Trabajar en esas aplicaciones nos ha permitido dotar a nuestros robots con algunas habilidades básicas, como la navegación, la comunicación entre robots y el reconocimiento de voz y las capacidades de comprensión.Por otro lado, en la segunda fase nos hemos centrado en la identificación y el desarrollo de los módulos básicos de comportamiento que este tipo de robots necesitan para ser socialmente creíbles y confiables mientras actúan como agentes sociales. Se ha desarrollado una arquitectura(framework) para robots socialmente interactivos que permite a los robots expresar diferentes tipos de emociones y mostrar un lenguaje corporal natural similar al humano según la tarea a realizar y lascondiciones ambientales.La validación de los diferentes estados de desarrollo de nuestros robots sociales se ha realizado mediante representaciones públicas. La exposición de nuestros robots al público en esas actuaciones se ha convertido en una herramienta esencial para medir cualitativamente la aceptación social de los prototipos que estamos desarrollando. De la misma manera que los robots necesitan un cuerpo físico para interactuar con el entorno y convertirse en inteligentes, los robots sociales necesitan participar socialmente en tareas reales para las que han sido desarrollados, para así poder mejorar su sociabilida

    Personalized robot assistant for support in dressing

    Get PDF
    Robot-assisted dressing is performed in close physical interaction with users who may have a wide range of physical characteristics and abilities. Design of user adaptive and personalized robots in this context is still indicating limited, or no consideration, of specific user-related issues. This paper describes the development of a multi-modal robotic system for a specific dressing scenario - putting on a shoe, where users’ personalized inputs contribute to a much improved task success rate. We have developed: 1) user tracking, gesture recognition andposturerecognitionalgorithmsrelyingonimagesprovidedby a depth camera; 2) a shoe recognition algorithm from RGB and depthimages;3)speechrecognitionandtext-to-speechalgorithms implemented to allow verbal interaction between the robot and user. The interaction is further enhanced by calibrated recognition of the users’ pointing gestures and adjusted robot’s shoe delivery position. A series of shoe fitting experiments have been performed on two groups of users, with and without previous robot personalization, to assess how it affects the interaction performance. Our results show that the shoe fitting task with the personalized robot is completed in shorter time, with a smaller number of user commands and reduced workload

    Emergent interfaces: vague, complex, bespoke and embodied interaction between humans and computers

    Get PDF
    Most human-computer interfaces are built on the paradigm of manipulating abstract representations. This can be limiting when computers are used in artistic performance or as mediators of social connection, where we rely on qualities of embodied thinking: intuition, context, resonance, ambiguity, fluidity. We explore an alternative approach to designing interaction that we call the emergent interface: interaction leveraging unsupervised machine learning to replace designed abstractions with contextually-derived emergent representations. The approach offers opportunities to create interfaces bespoke to a single individual, to continually evolve and adapt the interface in line with that individual’s needs and affordances, and to bridge more deeply with the complex and imprecise interaction that defines much of our non-digital communication. We explore this approach through artistic research rooted in music, dance and AI with the partially emergent system Sonified Body. The system maps the moving body into sound using an emergent representation of the body derived from a corpus of improvised movement from the first author. We explore this system in a residency with three dancers. We reflect on the broader implications and challenges of this alternative way of thinking about interaction, and how far it may help users avoid being limited by the assumptions of a system’s designer

    Autonomous Operation and Human-Robot Interaction on an Indoor Mobile Robot

    No full text
    MARVIN (Mobile Autonomous Robotic Vehicle for Indoor Navigation) was once the flagship of Victoria University’s mobile robotic fleet. However, over the years MARVIN has become obsolete. This thesis continues the the redevelopment of MARVIN, transforming it into a fully autonomous research platform for human-robot interaction (HRI). MARVIN utilises a Segway RMP, a self balancing mobility platform. This provides agile locomotion, but increases sensor processing complexity due to its dynamic pitch. MARVIN’s existing sensing systems (including a laser rangefinder and ultrasonic sensors) are augmented with tactile sensors and a Microsoft Kinect v2 RGB-D camera for 3D sensing. This allows the detection of the obstacles often found in MARVIN’s unmodified office-like operating environment. These sensors are processed using novel techniques to account for the Segway’s dynamic pitch. A newly developed navigation stack takes the processed sensor data to facilitate localisation, obstacle detection and motion planning. MARVIN’s inherited humanoid robotic torso is augmented with a touch screen and voice interface, enabling HRI. MARVIN’s HRI capabilities are demonstrated by implementing it as a robotic guide. This implementation is evaluated through a usability study and found to be successful. Through evaluations of MARVIN’s locomotion, sensing, localisation and motion planning systems, in addition to the usability study, MARVIN is found to be capable of both autonomous navigation and engaging HRI. These developed features open a diverse range of research directions and HRI tasks that MARVIN can be used to explore

    Muecas: a multi-sensor robotic head for affective human robot interaction and imitation

    Get PDF
    Este artículo presenta una cabeza robótica humanoide multi-sensor para la interacción del robot humano. El diseño de la cabeza robótica, Muecas, se basa en la investigación en curso sobre los mecanismos de percepción e imitación de las expresiones y emociones humanas. Estos mecanismos permiten la interacción directa entre el robot y su compañero humano a través de las diferentes modalidades del lenguaje natural: habla, lenguaje corporal y expresiones faciales. La cabeza robótica tiene 12 grados de libertad, en una configuración de tipo humano, incluyendo ojos, cejas, boca y cuello, y ha sido diseñada y construida totalmente por IADeX (Ingeniería, Automatización y Diseño de Extremadura) y RoboLab. Se proporciona una descripción detallada de su cinemática junto con el diseño de los controladores más complejos. Muecas puede ser controlado directamente por FACS (Sistema de Codificación de Acción Facial), el estándar de facto para reconocimiento y síntesis de expresión facial. Esta característica facilita su uso por parte de plataformas de terceros y fomenta el desarrollo de la imitación y de los sistemas basados en objetivos. Los sistemas de imitación aprenden del usuario, mientras que los basados en objetivos utilizan técnicas de planificación para conducir al usuario hacia un estado final deseado. Para mostrar la flexibilidad y fiabilidad de la cabeza robótica, se presenta una arquitectura de software capaz de detectar, reconocer, clasificar y generar expresiones faciales en tiempo real utilizando FACS. Este sistema se ha implementado utilizando la estructura robótica, RoboComp, que proporciona acceso independiente al hardware a los sensores en la cabeza. Finalmente, se presentan resultados experimentales que muestran el funcionamiento en tiempo real de todo el sistema, incluyendo el reconocimiento y la imitación de las expresiones faciales humanas.This paper presents a multi-sensor humanoid robotic head for human robot interaction. The design of the robotic head, Muecas, is based on ongoing research on the mechanisms of perception and imitation of human expressions and emotions. These mechanisms allow direct interaction between the robot and its human companion through the different natural language modalities: speech, body language and facial expressions. The robotic head has 12 degrees of freedom, in a human-like configuration, including eyes, eyebrows, mouth and neck, and has been designed and built entirely by IADeX (Engineering, Automation and Design of Extremadura) and RoboLab. A detailed description of its kinematics is provided along with the design of the most complex controllers. Muecas can be directly controlled by FACS (Facial Action Coding System), the de facto standard for facial expression recognition and synthesis. This feature facilitates its use by third party platforms and encourages the development of imitation and of goal-based systems. Imitation systems learn from the user, while goal-based ones use planning techniques to drive the user towards a final desired state. To show the flexibility and reliability of the robotic head, the paper presents a software architecture that is able to detect, recognize, classify and generate facial expressions in real time using FACS. This system has been implemented using the robotics framework, RoboComp, which provides hardware-independent access to the sensors in the head. Finally, the paper presents experimental results showing the real-time functioning of the whole system, including recognition and imitation of human facial expressions.Trabajo financiado por: Ministerio de Ciencia e Innovación. Proyecto TIN2012-38079-C03-1 Gobierno de Extremadura. Proyecto GR10144peerReviewe

    Gestures in human-robot interaction

    Get PDF
    Gesten sind ein Kommunikationsweg, der einem Betrachter Informationen oder Absichten übermittelt. Daher können sie effektiv in der Mensch-Roboter-Interaktion, oder in der Mensch-Maschine-Interaktion allgemein, verwendet werden. Sie stellen eine Möglichkeit für einen Roboter oder eine Maschine dar, um eine Bedeutung abzuleiten. Um Gesten intuitiv benutzen zukönnen und Gesten, die von Robotern ausgeführt werden, zu verstehen, ist es notwendig, Zuordnungen zwischen Gesten und den damit verbundenen Bedeutungen zu definieren -- ein Gestenvokabular. Ein Menschgestenvokabular definiert welche Gesten ein Personenkreis intuitiv verwendet, um Informationen zu übermitteln. Ein Robotergestenvokabular zeigt welche Robotergesten zu welcher Bedeutung passen. Ihre effektive und intuitive Benutzung hängt von Gestenerkennung ab, das heißt von der Klassifizierung der Körperbewegung in diskrete Gestenklassen durch die Verwendung von Mustererkennung und maschinellem Lernen. Die vorliegende Dissertation befasst sich mit beiden Forschungsbereichen. Als eine Voraussetzung für die intuitive Mensch-Roboter-Interaktion wird zunächst ein Aufmerksamkeitsmodell für humanoide Roboter entwickelt. Danach wird ein Verfahren für die Festlegung von Gestenvokabulare vorgelegt, das auf Beobachtungen von Benutzern und Umfragen beruht. Anschliessend werden experimentelle Ergebnisse vorgestellt. Eine Methode zur Verfeinerung der Robotergesten wird entwickelt, die auf interaktiven genetischen Algorithmen basiert. Ein robuster und performanter Gestenerkennungsalgorithmus wird entwickelt, der auf Dynamic Time Warping basiert, und sich durch die Verwendung von One-Shot-Learning auszeichnet, das heißt durch die Verwendung einer geringen Anzahl von Trainingsgesten. Der Algorithmus kann in realen Szenarien verwendet werden, womit er den Einfluss von Umweltbedingungen und Gesteneigenschaften, senkt. Schließlich wird eine Methode für das Lernen der Beziehungen zwischen Selbstbewegung und Zeigegesten vorgestellt.Gestures consist of movements of body parts and are a mean of communication that conveys information or intentions to an observer. Therefore, they can be effectively used in human-robot interaction, or in general in human-machine interaction, as a way for a robot or a machine to infer a meaning. In order for people to intuitively use gestures and understand robot gestures, it is necessary to define mappings between gestures and their associated meanings -- a gesture vocabulary. Human gesture vocabulary defines which gestures a group of people would intuitively use to convey information, while robot gesture vocabulary displays which robot gestures are deemed as fitting for a particular meaning. Effective use of vocabularies depends on techniques for gesture recognition, which considers classification of body motion into discrete gesture classes, relying on pattern recognition and machine learning. This thesis addresses both research areas, presenting development of gesture vocabularies as well as gesture recognition techniques, focusing on hand and arm gestures. Attentional models for humanoid robots were developed as a prerequisite for human-robot interaction and a precursor to gesture recognition. A method for defining gesture vocabularies for humans and robots, based on user observations and surveys, is explained and experimental results are presented. As a result of the robot gesture vocabulary experiment, an evolutionary-based approach for refinement of robot gestures is introduced, based on interactive genetic algorithms. A robust and well-performing gesture recognition algorithm based on dynamic time warping has been developed. Most importantly, it employs one-shot learning, meaning that it can be trained using a low number of training samples and employed in real-life scenarios, lowering the effect of environmental constraints and gesture features. Finally, an approach for learning a relation between self-motion and pointing gestures is presented
    corecore