5,326 research outputs found

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    Particle Filters for Colour-Based Face Tracking Under Varying Illumination

    Get PDF
    Automatic human face tracking is the basis of robotic and active vision systems used for facial feature analysis, automatic surveillance, video conferencing, intelligent transportation, human-computer interaction and many other applications. Superior human face tracking will allow future safety surveillance systems which monitor drowsy drivers, or patients and elderly people at the risk of seizure or sudden falls and will perform with lower risk of failure in unexpected situations. This area has actively been researched in the current literature in an attempt to make automatic face trackers more stable in challenging real-world environments. To detect faces in video sequences, features like colour, texture, intensity, shape or motion is used. Among these feature colour has been the most popular, because of its insensitivity to orientation and size changes and fast process-ability. The challenge of colour-based face trackers, however, has been dealing with the instability of trackers in case of colour changes due to the drastic variation in environmental illumination. Probabilistic tracking and the employment of particle filters as powerful Bayesian stochastic estimators, on the other hand, is increasing in the visual tracking field thanks to their ability to handle multi-modal distributions in cluttered scenes. Traditional particle filters utilize transition prior as importance sampling function, but this can result in poor posterior sampling. The objective of this research is to investigate and propose stable face tracker capable of dealing with challenges like rapid and random motion of head, scale changes when people are moving closer or further from the camera, motion of multiple people with close skin tones in the vicinity of the model person, presence of clutter and occlusion of face. The main focus has been on investigating an efficient method to address the sensitivity of the colour-based trackers in case of gradual or drastic illumination variations. The particle filter is used to overcome the instability of face trackers due to nonlinear and random head motions. To increase the traditional particle filter\u27s sampling efficiency an improved version of the particle filter is introduced that considers the latest measurements. This improved particle filter employs a new colour-based bottom-up approach that leads particles to generate an effective proposal distribution. The colour-based bottom-up approach is a classification technique for fast skin colour segmentation. This method is independent to distribution shape and does not require excessive memory storage or exhaustive prior training. Finally, to address the adaptability of the colour-based face tracker to illumination changes, an original likelihood model is proposed based of spatial rank information that considers both the illumination invariant colour ordering of a face\u27s pixels in an image or video frame and the spatial interaction between them. The original contribution of this work lies in the unique mixture of existing and proposed components to improve colour-base recognition and tracking of faces in complex scenes, especially where drastic illumination changes occur. Experimental results of the final version of the proposed face tracker, which combines the methods developed, are provided in the last chapter of this manuscript

    Probabilistic three-dimensional object tracking based on adaptive depth segmentation

    Get PDF
    Object tracking is one of the fundamental topics of computer vision with diverse applications. The arising challenges in tracking, i.e., cluttered scenes, occlusion, complex motion, and illumination variations have motivated utilization of depth information from 3D sensors. However, current 3D trackers are not applicable to unconstrained environments without a priori knowledge. As an important object detection module in tracking, segmentation subdivides an image into its constituent regions. Nevertheless, the existing range segmentation methods in literature are difficult to implement in real-time due to their slow performance. In this thesis, a 3D object tracking method based on adaptive depth segmentation and particle filtering is presented. In this approach, the segmentation method as the bottom-up process is combined with the particle filter as the top-down process to achieve efficient tracking results under challenging circumstances. The experimental results demonstrate the efficiency, as well as robustness of the tracking algorithm utilizing real-world range information

    Vision systems with the human in the loop

    Get PDF
    The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed

    A haptic-enabled multimodal interface for the planning of hip arthroplasty

    Get PDF
    Multimodal environments help fuse a diverse range of sensory modalities, which is particularly important when integrating the complex data involved in surgical preoperative planning. The authors apply a multimodal interface for preoperative planning of hip arthroplasty with a user interface that integrates immersive stereo displays and haptic modalities. This article overviews this multimodal application framework and discusses the benefits of incorporating the haptic modality in this area

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    Mirror mirror on the wall... an unobtrusive intelligent multisensory mirror for well-being status self-assessment and visualization

    Get PDF
    A person’s well-being status is reflected by their face through a combination of facial expressions and physical signs. The SEMEOTICONS project translates the semeiotic code of the human face into measurements and computational descriptors that are automatically extracted from images, videos and 3D scans of the face. SEMEOTICONS developed a multisensory platform in the form of a smart mirror to identify signs related to cardio-metabolic risk. The aim was to enable users to self-monitor their well-being status over time and guide them to improve their lifestyle. Significant scientific and technological challenges have been addressed to build the multisensory mirror, from touchless data acquisition, to real-time processing and integration of multimodal data

    Review of Face Detection Systems Based Artificial Neural Networks Algorithms

    Get PDF
    Face detection is one of the most relevant applications of image processing and biometric systems. Artificial neural networks (ANN) have been used in the field of image processing and pattern recognition. There is lack of literature surveys which give overview about the studies and researches related to the using of ANN in face detection. Therefore, this research includes a general review of face detection studies and systems which based on different ANN approaches and algorithms. The strengths and limitations of these literature studies and systems were included also.Comment: 16 pages, 12 figures, 1 table, IJMA Journa
    corecore