196 research outputs found

    Vision-Based 2D and 3D Human Activity Recognition

    Get PDF

    Multi-sensor fusion for human-robot interaction in crowded environments

    Get PDF
    For challenges associated with the ageing population, robot assistants are becoming a promising solution. Human-Robot Interaction (HRI) allows a robot to understand the intention of humans in an environment and react accordingly. This thesis proposes HRI techniques to facilitate the transition of robots from lab-based research to real-world environments. The HRI aspects addressed in this thesis are illustrated in the following scenario: an elderly person, engaged in conversation with friends, wishes to attract a robot's attention. This composite task consists of many problems. The robot must detect and track the subject in a crowded environment. To engage with the user, it must track their hand movement. Knowledge of the subject's gaze would ensure that the robot doesn't react to the wrong person. Understanding the subject's group participation would enable the robot to respect existing human-human interaction. Many existing solutions to these problems are too constrained for natural HRI in crowded environments. Some require initial calibration or static backgrounds. Others deal poorly with occlusions, illumination changes, or real-time operation requirements. This work proposes algorithms that fuse multiple sensors to remove these restrictions and increase the accuracy over the state-of-the-art. The main contributions of this thesis are: A hand and body detection method, with a probabilistic algorithm for their real-time association when multiple users and hands are detected in crowded environments; An RGB-D sensor-fusion hand tracker, which increases position and velocity accuracy by combining a depth-image based hand detector with Monte-Carlo updates using colour images; A sensor-fusion gaze estimation system, combining IR and depth cameras on a mobile robot to give better accuracy than traditional visual methods, without the constraints of traditional IR techniques; A group detection method, based on sociological concepts of static and dynamic interactions, which incorporates real-time gaze estimates to enhance detection accuracy.Open Acces

    Gaze, Posture and Gesture Recognition to Minimize Focus Shifts for Intelligent Operating Rooms in a Collaborative Support System

    Get PDF
    This paper describes the design of intelligent, collaborative operating rooms based on highly intuitive, natural and multimodal interaction. Intelligent operating rooms minimize surgeon’s focus shifts by minimizing both the focus spatial offset (distance moved by surgeon’s head or gaze to the new target) and the movement spatial offset (distance surgeon covers physically). These spatio-temporal measures have an impact on the surgeon’s performance in the operating room. I describe how machine vision techniques are used to extract spatio-temporal measures and to interact with the system, and how computer graphics techniques can be used to display visual medical information effectively and rapidly. Design considerations are discussed and examples showing the feasibility of the different approaches are presented

    Design of a Multiple-User Intelligent Feeding Robot for Elderly and Disabled

    Get PDF
    The number of elderly people around the world is growing rapidly. This has led to an increase in the number of people who are seeking assistance and adequate service either at home or in long-term- care institutions to successfully accomplish their daily activities. Responding to these needs has been a burden to the health care system in terms of labour and associated costs and has motivated research in developing alternative services using new technologies. Various intelligent, and non-intelligent, machines and robots have been developed to meet the needs of elderly and people with upper limb disabilities or dysfunctions in gaining independence in eating, which is one of the most frequent and time-consuming everyday tasks. However, in almost all cases, the proposed systems are designed only for the personal use of one individual and little effort to design a multiple-user feeding robot has been previously made. The feeding requirements of elderly in environments such as senior homes, where many elderly residents dine together at least three times per day, have not been extensively researched before. The aim of this research was to develop a machine to feed multiple elderly people based on their characteristics and feeding needs, as determined through observations at a nursing home. Observations of the elderly during meal times have revealed that almost 40% of the population was totally dependent on nurses or caregivers to be fed. Most of those remaining, suffered from hand tremors, joint pain or lack of hand muscle strength, which made utensil manipulation and coordination very difficult and the eating process both messy and lengthy. In addition, more than 43% of the elderly were very slow in eating because of chewing and swallowing problems and most of the rest were slow in scooping and directing utensils toward their mouths. Consequently, one nurse could only respond to a maximum of two diners simultaneously. In order to manage the needs of all elderly diners, they required the assistance of additional staff members. The limited time allocated for each meal and the daily progression of the seniors’ disabilities also made mealtime very challenging. Based on the caregivers’ opinion, many of the elderly in such environments can benefit from a machine capable of feeding multiple users simultaneously. Since eating is a slow procedure, the idle state of the robot during one user’s chewing and swallowing time can be allotted for feeding another person who is sitting at the same table. The observations and studies have resulted in the design of a food tray, and selection of an appropriate robot and applicable user interface. The proposed system uses a 6-DOF serial articulated robot in the center of a four-seat table along with a specifically designed food tray to feed one to four people. It employs a vision interface for food detection and recognition. Building the dynamic equations of the robotic system and simulation of the system were used to verify its dynamic behaviour before any prototyping and real-time testing

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    The Future of Humanoid Robots

    Get PDF
    This book provides state of the art scientific and engineering research findings and developments in the field of humanoid robotics and its applications. It is expected that humanoids will change the way we interact with machines, and will have the ability to blend perfectly into an environment already designed for humans. The book contains chapters that aim to discover the future abilities of humanoid robots by presenting a variety of integrated research in various scientific and engineering fields, such as locomotion, perception, adaptive behavior, human-robot interaction, neuroscience and machine learning. The book is designed to be accessible and practical, with an emphasis on useful information to those working in the fields of robotics, cognitive science, artificial intelligence, computational methods and other fields of science directly or indirectly related to the development and usage of future humanoid robots. The editor of the book has extensive R&D experience, patents, and publications in the area of humanoid robotics, and his experience is reflected in editing the content of the book

    Real-time Immersive human-computer interaction based on tracking and recognition of dynamic hand gestures

    Get PDF
    With fast developing and ever growing use of computer based technologies, human-computer interaction (HCI) plays an increasingly pivotal role. In virtual reality (VR), HCI technologies provide not only a better understanding of three-dimensional shapes and spaces, but also sensory immersion and physical interaction. With the hand based HCI being a key HCI modality for object manipulation and gesture based communication, challenges are presented to provide users a natural, intuitive, effortless, precise, and real-time method for HCI based on dynamic hand gestures, due to the complexity of hand postures formed by multiple joints with high degrees-of-freedom, the speed of hand movements with highly variable trajectories and rapid direction changes, and the precision required for interaction between hands and objects in the virtual world. Presented in this thesis is the design and development of a novel real-time HCI system based on a unique combination of a pair of data gloves based on fibre-optic curvature sensors to acquire finger joint angles, a hybrid tracking system based on inertia and ultrasound to capture hand position and orientation, and a stereoscopic display system to provide an immersive visual feedback. The potential and effectiveness of the proposed system is demonstrated through a number of applications, namely, hand gesture based virtual object manipulation and visualisation, hand gesture based direct sign writing, and hand gesture based finger spelling. For virtual object manipulation and visualisation, the system is shown to allow a user to select, translate, rotate, scale, release and visualise virtual objects (presented using graphics and volume data) in three-dimensional space using natural hand gestures in real-time. For direct sign writing, the system is shown to be able to display immediately the corresponding SignWriting symbols signed by a user using three different signing sequences and a range of complex hand gestures, which consist of various combinations of hand postures (with each finger open, half-bent, closed, adduction and abduction), eight hand orientations in horizontal/vertical plans, three palm facing directions, and various hand movements (which can have eight directions in horizontal/vertical plans, and can be repetitive, straight/curve, clockwise/anti-clockwise). The development includes a special visual interface to give not only a stereoscopic view of hand gestures and movements, but also a structured visual feedback for each stage of the signing sequence. An excellent basis is therefore formed to develop a full HCI based on all human gestures by integrating the proposed system with facial expression and body posture recognition methods. Furthermore, for finger spelling, the system is shown to be able to recognise five vowels signed by two hands using the British Sign Language in real-time
    • …
    corecore