7 research outputs found

    Hand Pointing Detection Using Live Histogram Template of Forehead Skin

    Full text link
    Hand pointing detection has multiple applications in many fields such as virtual reality and control devices in smart homes. In this paper, we proposed a novel approach to detect pointing vector in 2D space of a room. After background subtraction, face and forehead is detected. In the second step, forehead skin H-S plane histograms in HSV space is calculated. By using these histogram templates of users skin, and back projection method, skin areas are detected. The contours of hand are extracted using Freeman chain code algorithm. Next step is finding fingertips. Points in hand contour which are candidates for the fingertip can be found in convex defects of convex hull and contour. We introduced a novel method for finding the fingertip based on the special points on the contour and their relationships. Our approach detects hand-pointing vectors in live video from a common webcam with 94%TP and 85%TN.Comment: Accepted for oral presentation in DSP201

    Sign Language Recognition

    Get PDF
    This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief introduction to the motivations and requirements, followed by a précis of sign linguistics and their impact on the field. The types of data available and the relative merits are explored allowing examination of the features which can be extracted. Classifying the manual aspects of sign (similar to gestures) is then discussed from a tracking and non-tracking viewpoint before summarising some of the approaches to the non-manual aspects of sign languages. Methods for combining the sign classification results into full SLR are given showing the progression towards speech recognition techniques and the further adaptations required for the sign specific case. Finally the current frontiers are discussed and the recent research presented. This covers the task of continuous sign recognition, the work towards true signer independence, how to effectively combine the different modalities of sign, making use of the current linguistic research and adapting to larger more noisy data set

    Vision-based hand shape identification for sign language recognition

    Get PDF
    This thesis introduces an approach to obtain image-based hand features to accurately describe hand shapes commonly found in the American Sign Language. A hand recognition system capable of identifying 31 hand shapes from the American Sign Language was developed to identify hand shapes in a given input image or video sequence. An appearance-based approach with a single camera is used to recognize the hand shape. A region-based shape descriptor, the generic Fourier descriptor, invariant of translation, scale, and orientation, has been implemented to describe the shape of the hand. A wrist detection algorithm has been developed to remove the forearm from the hand region before the features are extracted. The recognition of the hand shapes is performed with a multi-class Support Vector Machine. Testing provided a recognition rate of approximately 84% based on widely varying testing set of approximately 1,500 images and training set of about 2,400 images. With a larger training set of approximately 2,700 images and a testing set of approximately 1,200 images, a recognition rate increased to about 88%

    Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

    Get PDF
    Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

    Template-basierte Klassifikation planarer Gesten

    Get PDF
    Pervasion of mobile devices led to a growing interest in touch-based interactions. However, multi-touch input is still restricted to direct manipulations. In current applications, gestural commands - if used at all - are only exploiting single-touch. The underlying motive for the work at hand is the conviction that a realization of advanced interaction techniques requires handy tools for supporting their interpretation. Barriers for own implementations of procedures are dismantled by providing proof of concept regarding manifold interactions, therefore, making benefits calculable to developers. Within this thesis, a recognition routine for planar, symbolic gestures is developed that can be trained by specifications of templates and does not imply restrictions to the versatility of input. To provide a flexible tool, the interpretation of a gesture is independent of its natural variances, i.e., translation, scale, rotation, and speed. Additionally, the essential number of specified templates per class is required to be small and classifications are subject to real-time criteria common in the context of typical user interactions. The gesture recognizer is based on the integration of a nearest neighbor approach into a Bayesian classification method. Gestures are split into meaningful, elementary tokens to retrieve a set of local features that are merged by a sensor fusion process to form a global maximum-likelihood representation. Flexibility and high accuracy of the approach is empirically proven in thorough tests. Retaining all requirements, the method is extended to support the prediction of partially entered gestures. Besides more efficient input, the possible specification of direct manipulation interactions by templates is beneficial. Suitability for practical use of all provided concepts is demonstrated on the basis of two applications developed for this purpose and providing versatile options of multi-finger input. In addition to a trainable recognizer for domain-independent sketches, a multi-touch text input system is created and tested with users. It is established that multi-touch input is utilized in sketching if it is available as an alternative. Furthermore, a constructed multi-touch gesture alphabet allows for more efficient text input in comparison to its single-touch pendant. The concepts presented in this work can be of equal benefit to UI designers, usability experts, and developers of feedforward-mechanisms for dynamic training methods of gestural interactions. Likewise, a decomposition of input into tokens and its interpretation by a maximum-likelihood matching with templates is transferable to other application areas as the offline recognition of symbols.Obwohl berührungsbasierte Interaktionen mit dem Aufkommen mobiler Geräte zunehmend Verbreitung fanden, beschränken sich Multi-Touch Eingaben größtenteils auf direkte Manipulationen. Im Bereich gestischer Kommandos finden, wenn überhaupt, nur Single-Touch Symbole Anwendung. Der vorliegenden Arbeit liegt der Gedanke zugrunde, dass die Umsetzung von Interaktionstechniken mit der Verfügbarkeit einfach zu handhabender Werkzeuge für deren Interpretation zusammenhängt. Auch kann die Hürde, eigene Techniken zu implementieren, verringert werden, wenn vielfältige Interaktionen erprobt sind und ihr Nutzen für Anwendungsentwickler abschätzbar wird. In der verfassten Dissertation wird ein Erkenner für planare, symbolische Gesten entwickelt, der über die Angabe von Templates trainiert werden kann und keine Beschränkung der Vielfalt von Eingaben auf berührungsempfindlichen Oberflächen voraussetzt. Um eine möglichst flexible Einsetzbarkeit zu gewährleisten, soll die Interpretation einer Geste unabhängig von natürlichen Varianzen - ihrer Translation, Skalierung, Rotation und Geschwindigkeit - und unter wenig spezifizierten Templates pro Klasse möglich sein. Weiterhin sind für Nutzerinteraktionen im Anwendungskontext übliche Echtzeit-Kriterien einzuhalten. Der vorgestellte Gestenerkenner basiert auf der Integration eines Nächste-Nachbar-Verfahrens in einen Ansatz der Bayes\'schen Klassifikation. Gesten werden in elementare, bedeutungstragende Einheiten zerlegt, aus deren lokalen Merkmalen mittels eines Sensor-Fusion Prozesses eine Maximum-Likelihood-Repräsentation abgeleitet wird. Die Flexibilität und hohe Genauigkeit des statistischen Verfahrens wird in ausführlichen Tests nachgewiesen. Unter gleichbleibenden Anforderungen wird eine Erweiterung vorgestellt, die eine Prädiktion von Gesten bei partiellen Eingaben ermöglicht. Deren Nutzen liegt - neben effizienteren Eingaben - in der nachgewiesenen Möglichkeit, per Templates spezifizierte direkte Manipulationen zu interpretieren. Zur Demonstration der Praxistauglichkeit der präsentierten Konzepte werden exemplarisch zwei Anwendungen entwickelt und mit Nutzern getestet, die eine vielseitige Verwendung von Mehr-Finger-Eingaben vorsehen. Neben einem Erkenner trainierbarer, domänenunabhängiger Skizzen wird ein System für die Texteingabe mit den Fingern bereitgestellt. Anhand von Nutzerstudien wird gezeigt, dass Multi-Touch beim Skizzieren verwendet wird, wenn es als Alternative zur Verfügung steht und die Verwendung eines Multi-Touch Gestenalphabetes im Vergleich zur Texteingabe per Single-Touch effizienteres Schreiben zulässt. Von den vorgestellten Konzepten können UI-Designer, Usability-Experten und Entwickler von Feedforward-Mechanismen zum dynamischen Lehren gestischer Eingaben gleichermaßen profitieren. Die Zerlegung einer Eingabe in Token und ihre Interpretation anhand der Zuordnung zu spezifizierten Templates lässt sich weiterhin auf benachbarte Gebiete, etwa die Offline-Erkennung von Symbolen, übertragen
    corecore