14 research outputs found

    A Visual Sensor for Domestic Service Robots

    Get PDF
    In this study, we present a visual sensor for domestic service robots, which can capture both color information and three-dimensional information in real time, by calibrating a time of flight camera and two CCD cameras. The problem of occlusions is solved by the proposed occlusion detection algorithm. Since the proposed sensor uses two CCD cameras, missing color information of occluded pixels is compensated by one another. We conduct several evaluations to validate the proposed sensor, including investigation on object recognition task under occluded scenes using the visual sensor. The results revealed the effectiveness of proposed visual sensor

    Exploitation of time-of-flight (ToF) cameras

    Get PDF
    This technical report reviews the state-of-the art in the field of ToF cameras, their advantages, their limitations, and their present-day applications sometimes in combination with other sensors. Even though ToF cameras provide neither higher resolution nor larger ambiguity-free range compared to other range map estimation systems, advantages such as registered depth and intensity data at a high frame rate, compact design, low weight and reduced power consumption have motivated their use in numerous areas of research. In robotics, these areas range from mobile robot navigation and map building to vision-based human motion capture and gesture recognition, showing particularly a great potential in object modeling and recognition.Preprin

    4D Unconstrained Real-time Face Recognition Using a Commodity Depthh Camera

    Get PDF
    Robust unconstrained real-time face recognition still remains a challenge today. The recent addition to the market of lightweight commodity depth sensors brings new possibilities for human-machine interaction and therefore face recognition. This article accompanies the reader through a succinct survey of the current literature on face recognition in general and 3D face recognition using depth sensors in particular. Consequent to the assessment of experiments performed using implementations of the most established algorithms, it can be concluded that the majority are biased towards qualitative performance and are lacking in speed. A novel method which uses noisy data from such a commodity sensor to build dynamic internal representations of faces is proposed. Distances to a surface normal to the face are measured in real-time and used as input to a specific type of recurrent neural network, namely long short-term memory. This enables the prediction of facial structure in linear time and also increases robustness towards partial occlusions

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Sistemas de monitorização e proteção baseados em visão 3D : desenvolvimento de uma aplicação de segurança e proteção industrial utilizandos Sensores RGB-D

    Get PDF
    Este trabalho teve como objetivo principal o estudo da monitorização de áreas ou volumes nos mais diversos ambientes. Um objetivo mais específico foi o desenvolvimento de um sistema de monitorização de segurança e proteção em ambiente industrial utilizando sensores RGB-D. Procedeu-se ao levantamento da legislação em vigor para a implementação e utilização de sistemas de monitorização de espaços públicos e privados com base em vídeo, bem como das tecnologias e sistemas existentes atualmente no mercado com o mesmo objetivo. Relativamente à legislação a considerar, foi necessário analisar duas perspetivas diferentes. A perspetiva da videovigilância e monitorização de espaços com o objetivo de garantir a segurança de pessoas e bens em relação a comportamentos impróprios e criminais, como vandalismo, roubo e violência, que obedece a regras definidas em legislação própria para a sua implementação e utilização. A outra perspetiva refere-se ao ambiente industrial, onde os sistemas de monitorização têm por finalidade o apoio à produção e à segurança das pessoas e equipamentos, e onde os aspetos mais relevantes a considerar estão associados ao cumprimento de normas relativas aos procedimentos, equipamentos e instalações. Estas duas perspetivas podem conjugar-se se existir captura e gravação de imagens e vídeo, onde seja permitida a identificação dos locais e das pessoas. Em termos tecnológicos, começou-se por analisar as técnicas e tecnologias associadas à captação de imagem e vídeo 3D, tais como a Triangulação, Stereo Vision e a técnica Time Of Flight. Considerando a técnica Time Of Flight, a mais avançada e com melhores resultados em termos de exatidão e precisão, procedeu-se à pesquisa de hardware que satisfizesse as necessidades específicas do projeto. O sensor Kinect V2 da Microsoft satisfaz essas necessidades e, portanto, foi a escolha natural para avançar com o projeto. Relativamente ao software utilizou-se o Matlab como ambiente de desenvolvimento que, em conjunto com o SDK da Microsoft e código C específico de integração do Kinect V2 no Matlab, serviram de base para a evolução e a concretização dos objetivos do projeto. No desenvolvimento da aplicação foram consideradas duas fases, a fase de calibração e a fase de monitorização e deteção. Na fase de calibração foi definido o volume a monitorizar e quantificada a ocupação do volume sem a existência de elementos invasivos. Na fase de monitorização e deteção foi capturada a informação 3D, que permite identificar elementos externos que representam uma intrusão ou evasão no/do volume, que está constantemente a ser monitorizado. Este processo de monitorização foi conseguido através da implementação de um algoritmo de quantificação do volume ocupado, baseado na Triangulação Delaunay em 3D. Para proceder ao teste em ambiente industrial do sistema desenvolvido, este foi inserido no espaço de operação de um braço robótico a funcionar em modo contínuo, de modo a verificar o seu bom funcionamento e avaliar a sua performance. Por fim, como objetivo de desenvolvimento futuro, foi proposta a associação integrada de vários sistemas Kinect V2, de forma a mapear completamente todo o volume 3D que se pretende monitorizar, possibilitando total flexibilidade de monitorização e deteção

    3D Head Tracking Based on Recognition and Interpolation Using a Time-Of- Flight Depth Sensor

    No full text
    This paper describes a head-tracking algorithm that is based on recognition and correlation-based weighted interpolation. The input is a sequence of 3D depth images generated by a novel time-of-flight depth sensor. These are processed to segment the background and foreground, and the latter is used as the input to the head tracking algorithm, which is composed of three major modules: First, a depth signature is created out of the depth images. Next, the signature is compared against signatures that are collected in a training set of depth images. Finally, a correlation metric is calculated between most possible signature hits. The head location is calculated by interpolating among stored depth values, using the correlation metrics as the weights. This combination of depth sensing and recognition-based head tracking provides more than 90 percent success. Even if the track is temporarily lost, it is easily recovered when a good match is obtained from the training set. The use of depth images and recognition-based head tracking achieves robust real-time tracking results under extreme conditions such as 180-degree rotation, temporary occlusions, and complex backgrounds. 1

    Multi-signal gesture recognition using body and hand poses

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 147-154).We present a vision-based multi-signal gesture recognition system that integrates information from body and hand poses. Unlike previous approaches to gesture recognition, which concentrated mainly on making it a signal signal, our system allows a richer gesture vocabulary and more natural human-computer interaction. The system consists of three parts: 3D body pose estimation, hand pose classification, and gesture recognition. 3D body pose estimation is performed following a generative model-based approach, using a particle filtering estimation framework. Hand pose classification is performed by extracting Histogram of Oriented Gradients features and using a multi-class Support Vector Machine classifier. Finally, gesture recognition is performed using a novel statistical inference framework that we developed for multi-signal pattern recognition, extending previous work on a discriminative hidden-state graphical model (HCRF) to consider multi-signal input data, which we refer to Multi Information-Channel Hidden Conditional Random Fields (MIC-HCRFs). One advantage of MIC-HCRF is that it allows us to capture complex dependencies of multiple information channels more precisely than conventional approaches to the task. Our system was evaluated on the scenario of an aircraft carrier flight deck environment, where humans interact with unmanned vehicles using existing body and hand gesture vocabulary. When tested on 10 gestures recorded from 20 participants, the average recognition accuracy of our system was 88.41%.by Yale Song.S.M

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]
    corecore