10 research outputs found

    Fingertip Detection Method Based on Curvature

    Get PDF
    指尖检测是基于视觉的徒手人机交互系统的关键环节,由于背景的复杂性和系统的实时性要求,导致指尖的精确定位在处理速度和准确性方面存在很大问题。本文针对这一问题,提出了一种简单高效的基于曲率的指尖检测方法。该方法首先将输入视频流基于肤色空间进行二值化,并将二值化后的视频序列作为输入数据;然后采用边缘检测算法提取出肤色区域的边缘(肤色区域的轮廓),在肤色区域的轮廓上根据曲率来对类指尖的点进行检测,并且根据类指尖点的位置关系来判定一个肤色区域是不是手;最后通过过滤算法过滤掉误判手臂点。实验结果表明,该方法在不同的应用背景下都具有很好的检测效果,对光照的鲁棒性也较高,并且能够达到实时检测的效果。The fingertip is an important feature whose detection is the key connection for many vision-based barehanded human computer interaction(HCI) systems.A novel method for fingertip detection is presented.The input data is the binary image segmented from the input video stream by the skin color space.Then,the edge detection algorithm is used to extract the edge of skin region.On the contour of skin region,fingertip-like points are detected using curvature information.The positions of these fingertip-like points are used to determine whether a skin region is a hand or not.Finally,it filters out some incorrect fingertip points.Experimental results show that the proposed algorithm performs well in different backgrounds and is robust to the influence of illumination in a real-time system

    Appearance-based motion recognition of human actions

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996.Includes bibliographical references (leaves 52-53).by James William Davis.M.S

    Integration of Image Processing Algorithm and Path Planning for Search and Rescue Robot

    Get PDF
    The focus of this project was to explore algorithms and techniques used in motion detection, object recognition and facial recognition, path finding as well as obstacle avoidance. To apply these algorithms, OpenCV was used. In terms of hardware, Raspberry Pi were used to perform image processing and robot movement. To perform image processing various OpenCV commands, such as cv2.GreyScale followed by cv2.GaussianBlur and cv2.DetectContour were combined. Path finding and obstacle avoidance were done by integrating an ultrasonic sensor into the system. Path finding is done by utilizing the coordinates of the bounding box. As a result, the robot turned around, 90 degree each time, to have a view of each of the four directions, in search of the target. Once motion was detected, the robot would stop at that direction and approach until the ultrasonic detected something. The robot would then run a scan on the target using facial recognition to determine whether it is human

    Novel Facial Image Recognition Techniques Employing Principal Component Analysis

    Get PDF
    Recently, pattern recognition/classification has received considerable attention in diverse engineering fields such as biomedical imaging, speaker identification, fingerprint recognition, and face recognition, etc. This study contributes novel techniques for facial image recognition based on the Two dimensional principal component analysis in the transform domain. These algorithms reduce the storage requirements by an order of magnitude and the computational complexity by a factor of 2 while maintaining the excellent recognition accuracy of the recently reported methods. The proposed recognition systems employ different structures, multicriteria and multitransform. In addition, principal component analysis in the transform domain in conjunction with vector quantization is developed which result in further improvement in the recognition accuracy and dimensionality reduction. Experimental results confirm the excellent properties of the proposed algorithms

    Feature-based tracking of multiple people for intelligent video surveillance.

    Get PDF
    Intelligent video surveillance is the process of performing surveillance task automatically by a computer vision system. It involves detecting and tracking people in the video sequence and understanding their behavior. This thesis addresses the problem of detecting and tracking multiple moving people with unknown background. We have proposed a feature-based framework for tracking, which requires feature extraction and feature matching. We have considered color, size, blob bounding box and motion information as features of people. In our feature-based tracking system, we have proposed to use Pearson correlation coefficient for matching feature-vector with temporal templates. The occlusion problem has been solved by histogram backprojection. Our tracking system is fast and free from assumptions about human structure. We have implemented our tracking system using Visual C++ and OpenCV and tested on real-world images and videos. Experimental results suggest that our tracking system achieved good accuracy and can process videos in 10-15 fps.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2006 .A42. Source: Masters Abstracts International, Volume: 45-01, page: 0347. Thesis (M.Sc.)--University of Windsor (Canada), 2006

    Gesture recognition using principal component analysis, multi-scale theory, and hidden Markov models

    Get PDF
    In this thesis, a dynamic gesture recognition system is presented which requires no special hardware other than a Web cam . The system is based on a novel method combining Principal Component Analysis (PCA) with hierarchical m ulti-scale theory and Discrete Hidden Markov Models (DHMMs). We use a hierarchical decision tree based on multi-scale theory. Firstly we convolve all members of the training data with a Gaussian kernel, w h ich blu rs d iffe ren c e s b e tw e en images and reduces their separation in feature space. Th is reduces the number of eigen vectors needed to describe the data. A principal component space is computed from the convolved data. We divide the data in this space in to several clusters using the £-means algorithm. Then the level of b lurring is reduced and PCA is applied to each of the clusters separately. A new principal component space is formed from each cluster. Each of these spaces is then divided in to clusters and the process is repeated. We thus produce a tree of principal component spaces where each level of the tree represents a different degree of blurring. The search time is then proportional to the depth of the tree, which makes it possible to search hundreds of gestures with very little computational cost. The output of the decision tree is then input in to the DHMM recogniser to recognise temporal information

    Learning-Based Hand Sign Recognition Using SHOSLIF-M

    No full text
    In this paper, we present a self-organizing framework called the SHOSLIF-M for learning and recognizing spatiotemporal events (or patterns) from intensity image sequences. The proposed framework consists of a multiclass, multivariate discriminant analysis to automatically select the most discriminating features (MDF), a space partition tree to achieve a logarithmic retrieval time complexity for a database of n items, and a general interpolation scheme to do view inference and generalization in the MDF space based on a small number of training samples. The system is tested to recognize 28 different hand signs. The experimental results show that the learned system can achieve a 96% recognition rate for test sequences that have not been used in the training phase. 1 Introduction The ability to interpret the hand gestures is essential if computer systems are built to interact with human users in a natural way. Recently, there is a significant amount research on hand gesture recognition ..

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    Traitement automatique de vidéos en LSF. Modélisation et exploitation des contraintes phonologiques du mouvement

    Get PDF
    Dans le domaine du Traitement automatique des langues naturelles, l'exploitation d'énoncés en langues des signes occupe une place à part. En raison des spécificités propres à la Langue des Signes Française (LSF) comme la simultanéité de plusieurs paramètres, le fort rôle de l'expression du visage, le recours massif à des unités gestuelles iconiques et l'utilisation de l'espace pour structurer l'énoncé, de nouvelles méthodes de traitement doivent êtres adaptées à cette langue. Nous exposons d'abord une méthode de suivi basée sur un filtre particulaire, permettant de déterminer à tout moment la position de la tête, des coudes, du buste et des mains d'un signeur dans une vidéo monovue. Cette méthode a été adaptée à la LSF pour la rendre plus robuste aux occultations, aux sorties de cadre et aux inversions des mains du signeur. Ensuite, l'analyse de données issues de capture de mouvements nous permet d'aboutir à une catégorisation de différents mouvements fréquemment utilisés dans la production de signes. Nous en proposons un modèle paramétrique que nous utilisons dans le cadre de la recherche de signes dans une vidéo, à partir d'un exemple vidéo de signe. Ces modèles de mouvement sont enfin réutilisés dans des applications permettant d'assister un utilisateur dans la création d'images de signe et la segmentation d'une vidéo en signes.There are a lot of differences between sign languages and vocal languages. Among them, we can underline the simultaneity of several parameters, the important role of the face expression, the recurrent use of iconic gestures and the use of signing space to structure utterances. As a consequence, new methods have to be developed and adapted to those languages. At first, we detail a method based on a particle filter to estimate at any time, the position of the signer's head, hands, elbows and shoulders in a monoview video. This method has been adapted to the French Sign Language in order to make it more robust to occlusion, inversion of the signer's hands or disappearance of hands from the video frame. Then, we propose a classification of the motion patterns that are frequently involved in the sign of production, thanks to the analysis of motion capture data. The parametric models associated to each sign pattern are used in the frame of automatic signe retrieval in a video from a filmed sign example. We finally include those models in two applications. The first one helps an user in creating sign pictures. The second one is dedicated to computer aided sign segmentation
    corecore