722 research outputs found

    Fair comparison of skin detection approaches on publicly available datasets

    Full text link
    Skin detection is the process of discriminating skin and non-skin regions in a digital image and it is widely used in several applications ranging from hand gesture analysis to track body parts and face detection. Skin detection is a challenging problem which has drawn extensive attention from the research community, nevertheless a fair comparison among approaches is very difficult due to the lack of a common benchmark and a unified testing protocol. In this work, we investigate the most recent researches in this field and we propose a fair comparison among approaches using several different datasets. The major contributions of this work are an exhaustive literature review of skin color detection approaches, a framework to evaluate and combine different skin detector approaches, whose source code is made freely available for future research, and an extensive experimental comparison among several recent methods which have also been used to define an ensemble that works well in many different problems. Experiments are carried out in 10 different datasets including more than 10000 labelled images: experimental results confirm that the best method here proposed obtains a very good performance with respect to other stand-alone approaches, without requiring ad hoc parameter tuning. A MATLAB version of the framework for testing and of the methods proposed in this paper will be freely available from https://github.com/LorisNann

    Evaluation of different chrominance models in the detection and reconstruction of faces and hands using the growing neural gas network

    Get PDF
    Physical traits such as the shape of the hand and face can be used for human recognition and identification in video surveillance systems and in biometric authentication smart card systems, as well as in personal health care. However, the accuracy of such systems suffers from illumination changes, unpredictability, and variability in appearance (e.g. occluded faces or hands, cluttered backgrounds, etc.). This work evaluates different statistical and chrominance models in different environments with increasingly cluttered backgrounds where changes in lighting are common and with no occlusions applied, in order to get a reliable neural network reconstruction of faces and hands, without taking into account the structural and temporal kinematics of the hands. First a statistical model is used for skin colour segmentation to roughly locate hands and faces. Then a neural network is used to reconstruct in 3D the hands and faces. For the filtering and the reconstruction we have used the growing neural gas algorithm which can preserve the topology of an object without restarting the learning process. Experiments conducted on our own database but also on four benchmark databases (Stirling’s, Alicante, Essex, and Stegmann’s) and on deaf individuals from normal 2D videos are freely available on the BSL signbank dataset. Results demonstrate the validity of our system to solve problems of face and hand segmentation and reconstruction under different environmental conditions

    Speech-Gesture Mapping and Engagement Evaluation in Human Robot Interaction

    Full text link
    A robot needs contextual awareness, effective speech production and complementing non-verbal gestures for successful communication in society. In this paper, we present our end-to-end system that tries to enhance the effectiveness of non-verbal gestures. For achieving this, we identified prominently used gestures in performances by TED speakers and mapped them to their corresponding speech context and modulated speech based upon the attention of the listener. The proposed method utilized Convolutional Pose Machine [4] to detect the human gesture. Dominant gestures of TED speakers were used for learning the gesture-to-speech mapping. The speeches by them were used for training the model. We also evaluated the engagement of the robot with people by conducting a social survey. The effectiveness of the performance was monitored by the robot and it self-improvised its speech pattern on the basis of the attention level of the audience, which was calculated using visual feedback from the camera. The effectiveness of interaction as well as the decisions made during improvisation was further evaluated based on the head-pose detection and interaction survey.Comment: 8 pages, 9 figures, Under review in IRC 201

    Melhorias na segmentação de pele humana em imagens digitais

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Segmentação de pele humana possui diversas aplicações nas áreas de visão computacional e reconhecimento de padrões, cujo propósito principal é distinguir regiões de pele e não pele em imagens. Apesar do elevado número de métodos disponíveis na literatura, a segmentação de pele com precisão ainda é uma tarefa desafiadora. Muitos métodos contam somente com a informação de cor, o que não discrimina completamente as regiões da imagem devido a variações nas condições de iluminação e à ambiguidade entre a cor da pele e do plano de fundo. Dessa forma, há ainda a demanda em melhorar a segmentação. Este trabalho apresenta três contribuições com respeito a essa necessidade. A primeira é um método autocontido para segmentação adaptativa de pele que faz uso de análise espacial para produzir regiões nas quais a cor da pele é estimada e, dessa forma, ajusta o padrão da cor para uma imagem em particular. A segunda é a introdução da detecção de saliência para, combinada com detectores de pele baseados em cor, realizar a remoção do plano de fundo, o que elimina muitas regiões de não pele. A terceira é uma melhoria baseada em textura utilizando superpixels para capturar energia de regiões na imagem filtrada, que é então utilizada para caracterizar regiões de não pele e assim eliminar a ambiguidade da cor adicionando um segundo voto. Resultados experimentais obtidos em bases de dados públicas comprovam uma melhoria significativa nos métodos propostos para segmentação de pele humana em comparação com abordagens disponíveis na literaturaAbstract: Human skin segmentation has several applications on computer vision and pattern recognition fields, whose main purpose is to distinguish skin and non-skin regions. Despite the large number of methods available in the literature, accurate skin segmentation is still a challenging task. Many methods rely only on color information, which does not completely discriminate the image regions due to variations in lighting conditions and ambiguity between skin and background color. Therefore, there is still demand to improve the segmentation process. Three main contributions toward this need are presented in this work. The first is a self-contained method for adaptive skin segmentation that makes use of spatial analysis to produce regions from which the overall skin color can be estimated and such that the color model is adjusted to a particular image. The second is the combination of saliency detection with color skin segmentation, which performs a background removal to eliminate non-skin regions. The third is a texture-based improvement using superpixels to capture energy of regions in the filtered image, employed to characterize non-skin regions and thus eliminate color ambiguity adding a second vote. Experimental results on public data sets demonstrate a significant improvement of the proposed methods for human skin segmentation over state-of-the-art approachesMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Audio-Video Event Recognition System For Public Transport Security

    Get PDF
    International audienceThis paper presents an audio-video surveillance system for the automatic surveillance in public transport vehicle. The system comprises six modules including in particular three novel ones: (i) Face Detection and Tracking, (ii) Audio Event Detection and (iii) Audio-Video Scenario Recognition. The Face Detection and Tracking module is responsible for detecting and tracking faces of people in front of cameras. The Audio Event Detection module detects abnormal audio events which are precursor for detecting scenarios which have been predefined by end-users. The Audio-Video Scenario Recognition module performs high level interpretation of the observed objects by combining audio and video events based on spatio-temporal reasoning. The performance of the system is evaluated for a series of pre-defined audio, video and audio-video events specified using an audio-video event ontology

    Audio-Visual Biometrics and Forgery

    Get PDF

    Hand gesture recognition in complex background based on convolutional pose machine and fuzzy gaussian mixture models

    Get PDF
    © 2020, The Author(s). Hand gesture is one of the most intuitive and natural ways for human to communicate with computers, and it has been widely adopted in many human–computer interaction applications. However, it is still a challenging problem when confronted with complex background, illumination variation and occlusion in real-world scenarios. In this paper, a two-stage hand gesture recognition method is proposed to tackle these problems. At the first stage, hand pose estimation is developed to locate the hand keypoints using the convolutional pose machine, which can effectively localize hand keypoints even in a complex background. At the second stage, the Fuzzy Gaussian mixture models (FGMMs) are tailored to reject the nongesture patterns and classify the gestures based on the estimated hand keypoints. Extensive experiments are conducted to evaluate the performance of the proposed method, and the result demonstrates that the proposed algorithm is effective, robust, and satisfactory in real-time scenarios

    인간 기계 상호작용을 위한 강건하고 정확한 손동작 추적 기술 연구

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 기계항공공학부, 2021.8. 이동준.Hand-based interface is promising for realizing intuitive, natural and accurate human machine interaction (HMI), as the human hand is main source of dexterity in our daily activities. For this, the thesis begins with the human perception study on the detection threshold of visuo-proprioceptive conflict (i.e., allowable tracking error) with or without cutantoues haptic feedback, and suggests tracking error specification for realistic and fluidic hand-based HMI. The thesis then proceeds to propose a novel wearable hand tracking module, which, to be compatible with the cutaneous haptic devices spewing magnetic noise, opportunistically employ heterogeneous sensors (IMU/compass module and soft sensor) reflecting the anatomical properties of human hand, which is suitable for specific application (i.e., finger-based interaction with finger-tip haptic devices). This hand tracking module however loses its tracking when interacting with, or being nearby, electrical machines or ferromagnetic materials. For this, the thesis presents its main contribution, a novel visual-inertial skeleton tracking (VIST) framework, that can provide accurate and robust hand (and finger) motion tracking even for many challenging real-world scenarios and environments, for which the state-of-the-art technologies are known to fail due to their respective fundamental limitations (e.g., severe occlusions for tracking purely with vision sensors; electromagnetic interference for tracking purely with IMUs (inertial measurement units) and compasses; and mechanical contacts for tracking purely with soft sensors). The proposed VIST framework comprises a sensor glove with multiple IMUs and passive visual markers as well as a head-mounted stereo camera; and a tightly-coupled filtering-based visual-inertial fusion algorithm to estimate the hand/finger motion and auto-calibrate hand/glove-related kinematic parameters simultaneously while taking into account the hand anatomical constraints. The VIST framework exhibits good tracking accuracy and robustness, affordable material cost, light hardware and software weights, and ruggedness/durability even to permit washing. Quantitative and qualitative experiments are also performed to validate the advantages and properties of our VIST framework, thereby, clearly demonstrating its potential for real-world applications.손 동작을 기반으로 한 인터페이스는 인간-기계 상호작용 분야에서 직관성, 몰입감, 정교함을 제공해줄 수 있어 많은 주목을 받고 있고, 이를 위해 가장 필수적인 기술 중 하나가 손 동작의 강건하고 정확한 추적 기술 이다. 이를 위해 본 학위논문에서는 먼저 사람 인지의 관점에서 손 동작 추적 오차의 인지 범위를 규명한다. 이 오차 인지 범위는 새로운 손 동작 추적 기술 개발 시 중요한 설계 기준이 될 수 있어 이를 피험자 실험을 통해 정량적으로 밝히고, 특히 손끝 촉각 장비가 있을때 이 인지 범위의 변화도 밝힌다. 이를 토대로, 촉각 피드백을 주는 것이 다양한 인간-기계 상호작용 분야에서 널리 연구되어 왔으므로, 먼저 손끝 촉각 장비와 함께 사용할 수 있는 손 동작 추적 모듈을 개발한다. 이 손끝 촉각 장비는 자기장 외란을 일으켜 착용형 기술에서 흔히 사용되는 지자기 센서를 교란하는데, 이를 적절한 사람 손의 해부학적 특성과 관성 센서/지자기 센서/소프트 센서의 적절한 활용을 통해 해결한다. 이를 확장하여 본 논문에서는, 촉각 장비 착용 시 뿐 아니라 모든 장비 착용 / 환경 / 물체와의 상호작용 시에도 사용 가능한 새로운 손 동작 추적 기술을 제안한다. 기존의 손 동작 추적 기술들은 가림 현상 (영상 기반 기술), 지자기 외란 (관성/지자기 센서 기반 기술), 물체와의 접촉 (소프트 센서 기반 기술) 등으로 인해 제한된 환경에서 밖에 사용하지 못한다. 이를 위해 많은 문제를 일으키는 지자기 센서 없이 상보적인 특성을 지니는 관성 센서와 영상 센서를 융합하고, 이때 작은 공간에 다 자유도의 움직임을 갖는 손 동작을 추적하기 위해 다수의 구분되지 않는 마커들을 사용한다. 이 마커의 구분 과정 (correspondence search)를 위해 기존의 약결합 (loosely-coupled) 기반이 아닌 강결합 (tightly-coupled 기반 센서 융합 기술을 제안하고, 이를 통해 지자기 센서 없이 정확한 손 동작이 가능할 뿐 아니라 착용형 센서들의 정확성/편의성에 문제를 일으키던 센서 부착 오차 / 사용자의 손 모양 등을 자동으로 정확히 보정한다. 이 제안된 영상-관성 센서 융합 기술 (Visual-Inertial Skeleton Tracking (VIST)) 의 뛰어난 성능과 강건성이 다양한 정량/정성 실험을 통해 검증되었고, 이는 VIST의 다양한 일상환경에서 기존 시스템이 구현하지 못하던 손 동작 추적을 가능케 함으로써, 많은 인간-기계 상호작용 분야에서의 가능성을 보여준다.1 Introduction 1 1.1. Motivation 1 1.2. Related Work 5 1.3. Contribution 12 2 Detection Threshold of Hand Tracking Error 16 2.1. Motivation 16 2.2. Experimental Environment 20 2.2.1. Hardware Setup 21 2.2.2. Virtual Environment Rendering 23 2.2.3. HMD Calibration 23 2.3. Identifying the Detection Threshold of Tracking Error 26 2.3.1. Experimental Setup 27 2.3.2. Procedure 27 2.3.3. Experimental Result 31 2.4. Enlarging the Detection Threshold of Tracking Error by Haptic Feedback 31 2.4.1. Experimental Setup 31 2.4.2. Procedure 32 2.4.3. Experimental Result 34 2.5. Discussion 34 3 Wearable Finger Tracking Module for Haptic Interaction 38 3.1. Motivation 38 3.2. Development of Finger Tracking Module 42 3.2.1. Hardware Setup 42 3.2.2. Tracking algorithm 45 3.2.3. Calibration method 48 3.3. Evaluation for VR Haptic Interaction Task 50 3.3.1. Quantitative evaluation of FTM 50 3.3.2. Implementation of Wearable Cutaneous Haptic Interface 51 3.3.3. Usability evaluation for VR peg-in-hole task 53 3.4. Discussion 57 4 Visual-Inertial Skeleton Tracking for Human Hand 59 4.1. Motivation 59 4.2. Hardware Setup and Hand Models 62 4.2.1. Human Hand Model 62 4.2.2. Wearable Sensor Glove 62 4.2.3. Stereo Camera 66 4.3. Visual Information Extraction 66 4.3.1. Marker Detection in Raw Images 68 4.3.2. Cost Function for Point Matching 68 4.3.3. Left-Right Stereo Matching 69 4.4. IMU-Aided Correspondence Search 72 4.5. Filtering-based Visual-Inertial Sensor Fusion 76 4.5.1. EKF States for Hand Tracking and Auto-Calibration 78 4.5.2. Prediction with IMU Information 79 4.5.3. Correction with Visual Information 82 4.5.4. Correction with Anatomical Constraints 84 4.6. Quantitative Evaluation for Free Hand Motion 87 4.6.1. Experimental Setup 87 4.6.2. Procedure 88 4.6.3. Experimental Result 90 4.7. Quantitative and Comparative Evaluation for Challenging Hand Motion 95 4.7.1. Experimental Setup 95 4.7.2. Procedure 96 4.7.3. Experimental Result 98 4.7.4. Performance Comparison with Existing Methods for Challenging Hand Motion 101 4.8. Qualitative Evaluation for Real-World Scenarios 105 4.8.1. Visually Complex Background 105 4.8.2. Object Interaction 106 4.8.3. Wearing Fingertip Cutaneous Haptic Devices 109 4.8.4. Outdoor Environment 111 4.9. Discussion 112 5 Conclusion 116 References 124 Abstract (in Korean) 139 Acknowledgment 141박
    corecore