7 research outputs found

    American Sign Language alphabet recognition using Microsoft Kinect

    Get PDF
    American Sign Language (ASL) fingerspelling recognition using marker-less vision sensors is a challenging task due to the complexity of ASL signs, self-occlusion of the hand, and limited resolution of the sensors. This thesis describes a new method for ASL fingerspelling recognition using a low-cost vision camera, which is Microsoft\u27s Kinect. A segmented hand configuration is first obtained by using a depth contrast feature based per-pixel classification algorithm. Then, a hierarchical mode-finding method is developed and implemented to localize hand joint positions under kinematic constraints. Finally, a Random Decision Forest (RDF) classifier is built to recognize ASL signs according to the joint angles. To validate the performance of this method, a dataset containing 75,000 samples of 24 static ASL alphabet signs is used. The system is able to achieve a mean accuracy of 92%. We have also used a publicly available dataset from Surrey University to evaluate our method. The results have shown that our method can achieve higher accuracy in recognizing ASL alphabet signs in comparison to the previous benchmarks. --Abstract, page iii

    Vision-based hand shape identification for sign language recognition

    Get PDF
    This thesis introduces an approach to obtain image-based hand features to accurately describe hand shapes commonly found in the American Sign Language. A hand recognition system capable of identifying 31 hand shapes from the American Sign Language was developed to identify hand shapes in a given input image or video sequence. An appearance-based approach with a single camera is used to recognize the hand shape. A region-based shape descriptor, the generic Fourier descriptor, invariant of translation, scale, and orientation, has been implemented to describe the shape of the hand. A wrist detection algorithm has been developed to remove the forearm from the hand region before the features are extracted. The recognition of the hand shapes is performed with a multi-class Support Vector Machine. Testing provided a recognition rate of approximately 84% based on widely varying testing set of approximately 1,500 images and training set of about 2,400 images. With a larger training set of approximately 2,700 images and a testing set of approximately 1,200 images, a recognition rate increased to about 88%

    Toward Understanding Human Expression in Human-Robot Interaction

    Get PDF
    Intelligent devices are quickly becoming necessities to support our activities during both work and play. We are already bound in a symbiotic relationship with these devices. An unfortunate effect of the pervasiveness of intelligent devices is the substantial investment of our time and effort to communicate intent. Even though our increasing reliance on these intelligent devices is inevitable, the limits of conventional methods for devices to perceive human expression hinders communication efficiency. These constraints restrict the usefulness of intelligent devices to support our activities. Our communication time and effort must be minimized to leverage the benefits of intelligent devices and seamlessly integrate them into society. Minimizing the time and effort needed to communicate our intent will allow us to concentrate on tasks in which we excel, including creative thought and problem solving. An intuitive method to minimize human communication effort with intelligent devices is to take advantage of our existing interpersonal communication experience. Recent advances in speech, hand gesture, and facial expression recognition provide alternate viable modes of communication that are more natural than conventional tactile interfaces. Use of natural human communication eliminates the need to adapt and invest time and effort using less intuitive techniques required for traditional keyboard and mouse based interfaces. Although the state of the art in natural but isolated modes of communication achieves impressive results, significant hurdles must be conquered before communication with devices in our daily lives will feel natural and effortless. Research has shown that combining information between multiple noise-prone modalities improves accuracy. Leveraging this complementary and redundant content will improve communication robustness and relax current unimodal limitations. This research presents and evaluates a novel multimodal framework to help reduce the total human effort and time required to communicate with intelligent devices. This reduction is realized by determining human intent using a knowledge-based architecture that combines and leverages conflicting information available across multiple natural communication modes and modalities. The effectiveness of this approach is demonstrated using dynamic hand gestures and simple facial expressions characterizing basic emotions. It is important to note that the framework is not restricted to these two forms of communication. The framework presented in this research provides the flexibility necessary to include additional or alternate modalities and channels of information in future research, including improving the robustness of speech understanding. The primary contributions of this research include the leveraging of conflicts in a closed-loop multimodal framework, explicit use of uncertainty in knowledge representation and reasoning across multiple modalities, and a flexible approach for leveraging domain specific knowledge to help understand multimodal human expression. Experiments using a manually defined knowledge base demonstrate an improved average accuracy of individual concepts and an improved average accuracy of overall intents when leveraging conflicts as compared to an open-loop approach

    Detection of Fingertips in Human Hand Movement Sequences

    No full text
    This paper presents an hierarchical approach with neural networks to locate the positions of the fingertips in grey-scale images of human hands. The first chapters introduce and sum up the research done in this area. Afterwards, our hierarchical approach and the preprocessing of the grey-scale images are described. A low-dimensional encoding of the images is done by the means of Gabor-Filters and a special kind of artificial neural net, the LLM-net, is employed to find the positions of the fingertips. The capabilities of the system are demonstrated on three tasks: locating the tip of the forefinger and of the thumb, finding the pointing-direction regardless of the operator's pointing style, and detecting all 5 fingertips in hand movement sequences. The system is able to perform these tasks even when the fingertips are in an area with low contrast

    Detection of Fingertips in Human Hand Movement Sequences

    No full text
    Nölker C, Ritter H. Detection of Fingertips in Human Hand Movement Sequences. In: Wachsmuth I, Fröhlich M, eds. Gesture and Sign Language in Human-Computer Interaction. Proceedings of the International Gesture Workshop 1997. Lecture Notes in Computer Science. Vol 1371. Berlin: Springer; 1998: 209-218.This paper presents an hierarchical approach with neural networks to locate the positions of the fingertips in grey-scale images of human hands. The first chapters introduce and sum up the research done in this area. Afterwards, our hierarchical approach and the preprocessing of the grey-scale images are described. A low-dimensional encoding of the images is done by the means of Gabor-Filters and a special kind of artificial neural net, the LLM-net, is employed to find the positions of the fingertips. The capabilities of the system are demonstrated on three tasks: locating the tip of the forefinger and of the thumb, finding the pointing-direction regardless of the operator’s pointing style, and detecting all 5 fingertips in hand movement sequences. The system is able to perform these tasks even when the fingertips are in an area with low contrast

    Detection of fingertips in human hand movement sequences

    No full text
    corecore