1,104 research outputs found

    Evaluation of Motion Velocity as a Feature for Sign Language Detection

    Get PDF
    Popular video sharing websites contain a large collection of videos in various sign languages. These websites have the potential of being a significant source of knowledge sharing and communication for the members of the deaf and hard-of-hearing community. However, prior studies have shown that traditional keyword-based search does not do a good job of discovering these videos. Dr. Frank Shipman and others have been working towards building a distributed digital library by indexing the sign language videos available online. This system employs an automatic detector, based on visual features extracted from the video, for filtering out non-sign language content. Features such as the amount and location of hand movements, symmetry of motion etc. have been experimented with for this purpose. Caio Monteiro and his team designed a classifier which uses face detection to identify the region-of-interest (ROI) in a frame, and foreground segmentation to estimate amount of hand motion within the region. It was later improved upon by Karappa et al. by dividing the ROI using polar coordinates and estimating motion in each division to form a composite feature set. This thesis work examines another visual feature associated with the signing activity i.e. speed of hand movements. Speed based features performed better compared to the foreground-based features for a complex dataset of SL and non-SL videos. The F1 score showed a jump from 0.73 to 0.78. However, for a second dataset consisting of videos with single signers and static backgrounds, the classification scores dipped. More consistent performance improvements were observed when features from the two feature sets were used in conjunction. F1 score of 0.76 was observed for the complex dataset. For the second dataset, the F1 score changed from 0.85 to 0.86. Another associated problem is identifying the sign language in a video. The impact of speed of motion on the problem of classifying American Sign Language versus British Sign Language was found to be minimal. We concluded that it is the location of motion which influences this problem more than either the speed or the amount of motion. Non-speed related analyses of sign language detection were also explored. Since the American Sign Language alphabet is one-handed, it was expected that videos with left-handed signing might be falsely identified as British Sign Language, which has a two-handed alphabet. We briefly studied this issue with respect to our corpus of ASL and BSL videos and discovered that our classifier design does not suffer from this issue. Apart from this, we explored speeding up the classification process by computing symmetry of motion in the ROI on selected keyframes as a single feature for classification. The resulting feature extraction was significantly faster but the precision and recall values depreciated to 59% and 62% respectively for a F1 score of .61

    Gesture Recognition Based on Computer Vision on a Standalone System

    Get PDF
    Our project uses computer vision methods gesture recognition in which a camera interfaced to a system captures real time images and after further processing able to recognize the gesture shown to be interpreted. Our project mainly aims at hand gestures and after extracting information we try to produce it as an audio or in some visual form. We have used adaptive background subtraction with Haar classifiers to implement segmentation then we used convex hull and convex defects along with other feature extraction algorithms to interpret the gesture. First, this is implemented on a PC or laptop and then to produce a standalone system, we have to perform all this steps on a system which is dedicated to perform only the given specified task. For this we have chosen Beaglebone Black as a platform to implement our idea. The development comes with ARM Cortex A8 processor supported by NEON processor for video and image processing. It works on a clock frequency of maximum 1 GHz. It is 32 bit processor but it can be used in thumb mode i.e. it can work in 16 bit mode. This board supports Ubuntu, Android with some modification. Our first task is to interface a camera to the board so that it can capture images and store those as matrices followed by our steps to modify the installed Operating System to our purpose and implement all the above processes so that we can come up with a system which can perform gesture recognition

    A new framework for sign language recognition based on 3D handshape identification and linguistic modeling

    Full text link
    Current approaches to sign recognition by computer generally have at least some of the following limitations: they rely on laboratory conditions for sign production, are limited to a small vocabulary, rely on 2D modeling (and therefore cannot deal with occlusions and off-plane rotations), and/or achieve limited success. Here we propose a new framework that (1) provides a new tracking method less dependent than others on laboratory conditions and able to deal with variations in background and skin regions (such as the face, forearms, or other hands); (2) allows for identification of 3D hand configurations that are linguistically important in American Sign Language (ASL); and (3) incorporates statistical information reflecting linguistic constraints in sign production. For purposes of large-scale computer-based sign language recognition from video, the ability to distinguish hand configurations accurately is critical. Our current method estimates the 3D hand configuration to distinguish among 77 hand configurations linguistically relevant for ASL. Constraining the problem in this way makes recognition of 3D hand configuration more tractable and provides the information specifically needed for sign recognition. Further improvements are obtained by incorporation of statistical information about linguistic dependencies among handshapes within a sign derived from an annotated corpus of almost 10,000 sign tokens

    Hand gesture based digit recognition

    Get PDF
    Recognition of static hand gestures in our daily plays an important role in human-computer interaction. Hand gesture recognition has been a challenging task now a days so a lot of research topic has been going on due to its increased demands in human computer interaction. Since Hand gestures have been the most natural communication medium among human being, so this facilitate efficient human computer interaction in many electronics gazettes . This has led us to take up this task of hand gesture recognition. In this project different hand gestures are recognized and no of fingers are counted. Recognition process involve steps like feature extraction, features reduction and classification. To make the recognition process robust against varying illumination we used lighting compensation method along with YCbCr model. Gabor filter has been used for feature extraction because of its special mathematical properties. Gabor based feature vectors have high dimension so in our project 15 local gabor filters are used instead of 40 Gabor filters. The objective in using fifteen Gabor filters is used to mitigate the complexity with improved accuracy. In this project the problem of high dimensionality of feature vector is being solved by using PCA. Using local Gabor filter helps in reduction of data redundancy as compared to that of 40 filters. Classification of the 5 different gestures is done with the use of one against all multiclass SVM which is also compared with Euclidean distance and cosine similarity while the former giving an accuracy of 90.86%

    Gestures in Machine Interaction

    Full text link
    Vnencumbered-gesture-interaction (VGI) describes the use of unrestricted gestures in machine interaction. The development of such technology will enable users to interact with machines and virtual environments by performing actions like grasping, pinching or waving without the need of peripherals. Advances in image-processing and pattern recognition make such interaction viable and in some applications more practical than current modes of keyboard, mouse and touch-screen interaction provide. VGI is emerging as a popular topic amongst Human-Computer Interaction (HCI), Computer-vision and gesture research; and is developing into a topic with potential to significantly impact the future of computer-interaction, robot-control and gaming. This thesis investigates whether an ergonomic model of VGI can be developed and implemented on consumer devices by considering some of the barriers currently preventing such a model of VGI from being widely adopted. This research aims to address the development of freehand gesture interfaces and accompanying syntax. Without the detailed consideration of the evolution of this field the development of un-ergonomic, inefficient interfaces capable of placing undue strain on interface users becomes more likely. In the course of this thesis some novel design and methodological assertions are made. The Gesture in Machine Interaction (GiMI) syntax model and the Gesture-Face Layer (GFL), developed in the course of this research, have been designed to facilitate ergonomic gesture interaction. The GiMI is an interface syntax model designed to enable cursor control, browser navigation commands and steering control for remote robots or vehicles. Through applying state-of-the-art image processing that facilitates three-dimensional (3D) recognition of human action, this research investigates how interface syntax can incorporate the broadest range of human actions. By advancing our understanding of ergonomic gesture syntax, this research aims to assist future developers evaluate the efficiency of gesture interfaces, lexicons and syntax

    Developing a Sign Language Video Collection via Metadata and Video Classifiers

    Get PDF
    Video sharing sites have become a central tool for the storage and dissemination of sign language content. Sign language videos have many purposes, including sharing experiences or opinions, teaching and practicing a sign language, etc. However, due to limitations of term-based search, these videos can be hard to locate. This results in a diminished value of these sites for the deaf or hard-of-hearing community. As a result, members of the community frequently engage in a push-style delivery of content, sharing direct links to sign language videos with other members of the sign language community. To address this problem, we propose the Sign Language Digital Library (SLaDL). SLaDL is composed of two main sub-systems, a crawler that collects potential videos for inclusion into the digital library corpus, and an automatic classification system that detects and identifies sign language presence in the crawled videos. These components attempt to filter out videos that do not include sign language from the collection and to organize sign language videos based on different languages. This dissertation explores individual and combined components of the classification system. The components form a cascade of multimodal classifiers aimed at achieving high accuracy when classifying potential videos while minimizing the computational effort. A web application coordinates the execution of these two subsystems and enables user interaction (browsing and searching) with the library corpus. Since the collection of the digital library is automatically curated by the cascading classifier, the number of irrelevant results is expected to be drastically lower when compared to general-purpose video sharing sites. iii Video sharing sites have become a central tool for the storage and dissemination of sign language content. Sign language videos have many purposes, including sharing experiences or opinions, teaching and practicing a sign language, etc. However, due to limitations of term-based search, these videos can be hard to locate. This results in a diminished value of these sites for the deaf or hard-of-hearing community. As a result, members of the community frequently engage in a push-style delivery of content, sharing direct links to sign language videos with other members of the sign language community. To address this problem, we propose the Sign Language Digital Library (SLaDL). SLaDL is composed of two main sub-systems, a crawler that collects potential videos for inclusion into the digital library corpus, and an automatic classification system that detects and identifies sign language presence in the crawled videos. These components attempt to filter out videos that do not include sign language from the collection and to organize sign language videos based on different languages. This dissertation explores individual and combined components of the classification system. The components form a cascade of multimodal classifiers aimed at achieving high accuracy when classifying potential videos while minimizing the computational effort. A web application coordinates the execution of these two subsystems and enables user interaction (browsing and searching) with the library corpus. Since the collection of the digital library is automatically curated by the cascading classifier, the number of irrelevant results is expected to be drastically lower when compared to general-purpose video sharing sites. The evaluation involved a series of experiments focused on specific components of the system, and on analyzing how to best configure SLaDL. In the first set of experiments, we investigated three different crawling approaches, assessing how they compared in terms of both finding a large quantity of sign language videos and expanding the variety of videos in the collection. Secondly, we evaluated the performance of different approaches to multimodal classification in terms of precision, recall, F1 score, and computational costs. Lastly, we incorporated the best multimodal approach into cascading classifiers to reduce computation while preserving accuracy. We experimented with four different cascading configurations and analyzed their performance for the detection and identification of signed content. Given our findings of each experiment, we proposed the set up for an instantiation of SLaDL

    Moving object detection for interception by a humanoid robot

    Get PDF
    Interception of a moving object with an autonomous robot is an important problem in robotics. It has various application areas, such as in an industrial setting where products on a conveyor would be picked up by a robotic arm, in the military to halt intruders, in robotic soccer (where the robots try to get to the moving ball and try to block an opponent\u27s attempt to pass the ball), and in other challenging situations. Interception, in and of itself, is a complex task that demands a system with target recognition capability, proper navigation and actuation toward the moving target. There are numerous techniques for intercepting stationary targets and targets that move along a certain trajectory (linear, circular, and parabolic). However, much less research has been done for objects moving with an unknown and unpredictable trajectory, changing scale as well and having a different view point, where, additionally, the reference frame of the robot vision system is also dynamic. This study aims to find methods for object detection and tracking using vision system applicable for autonomous interception of a moving humanoid robot target by another humanoid robot. With the use of the implemented vision system, a robot is able to detect, track and intercept in a dynamic environment the moving target, taking into account the unique specifications of a humanoid robot, such as the kinematics of walking. The vision system combined object detection based on Haar/LBP feature classifiers trained on Boosted Cascades\u27\u27 and target contour tracking using optical flow techniques. The constant updates during navigation helped to intercept the object moving with unpredicted trajectory

    The eyes have it

    Get PDF
    corecore