4,823 research outputs found

    An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation

    Full text link
    An appearance-based framework for 3D hand shape classification and simultaneous camera viewpoint estimation is presented. Given an input image of a segmented hand, the most similar matches from a large database of synthetic hand images are retrieved. The ground truth labels of those matches, containing hand shape and camera viewpoint information, are returned by the system as estimates for the input image. Database retrieval is done hierarchically, by first quickly rejecting the vast majority of all database views, and then ranking the remaining candidates in order of similarity to the input. Four different similarity measures are employed, based on edge location, edge orientation, finger location and geometric moments.National Science Foundation (IIS-9912573, EIA-9809340

    A survey on mouth modeling and analysis for Sign Language recognition

    Get PDF
    © 2015 IEEE.Around 70 million Deaf worldwide use Sign Languages (SLs) as their native languages. At the same time, they have limited reading/writing skills in the spoken language. This puts them at a severe disadvantage in many contexts, including education, work, usage of computers and the Internet. Automatic Sign Language Recognition (ASLR) can support the Deaf in many ways, e.g. by enabling the development of systems for Human-Computer Interaction in SL and translation between sign and spoken language. Research in ASLR usually revolves around automatic understanding of manual signs. Recently, ASLR research community has started to appreciate the importance of non-manuals, since they are related to the lexical meaning of a sign, the syntax and the prosody. Nonmanuals include body and head pose, movement of the eyebrows and the eyes, as well as blinks and squints. Arguably, the mouth is one of the most involved parts of the face in non-manuals. Mouth actions related to ASLR can be either mouthings, i.e. visual syllables with the mouth while signing, or non-verbal mouth gestures. Both are very important in ASLR. In this paper, we present the first survey on mouth non-manuals in ASLR. We start by showing why mouth motion is important in SL and the relevant techniques that exist within ASLR. Since limited research has been conducted regarding automatic analysis of mouth motion in the context of ALSR, we proceed by surveying relevant techniques from the areas of automatic mouth expression and visual speech recognition which can be applied to the task. Finally, we conclude by presenting the challenges and potentials of automatic analysis of mouth motion in the context of ASLR

    Single camera pose estimation using Bayesian filtering and Kinect motion priors

    Full text link
    Traditional approaches to upper body pose estimation using monocular vision rely on complex body models and a large variety of geometric constraints. We argue that this is not ideal and somewhat inelegant as it results in large processing burdens, and instead attempt to incorporate these constraints through priors obtained directly from training data. A prior distribution covering the probability of a human pose occurring is used to incorporate likely human poses. This distribution is obtained offline, by fitting a Gaussian mixture model to a large dataset of recorded human body poses, tracked using a Kinect sensor. We combine this prior information with a random walk transition model to obtain an upper body model, suitable for use within a recursive Bayesian filtering framework. Our model can be viewed as a mixture of discrete Ornstein-Uhlenbeck processes, in that states behave as random walks, but drift towards a set of typically observed poses. This model is combined with measurements of the human head and hand positions, using recursive Bayesian estimation to incorporate temporal information. Measurements are obtained using face detection and a simple skin colour hand detector, trained using the detected face. The suggested model is designed with analytical tractability in mind and we show that the pose tracking can be Rao-Blackwellised using the mixture Kalman filter, allowing for computational efficiency while still incorporating bio-mechanical properties of the upper body. In addition, the use of the proposed upper body model allows reliable three-dimensional pose estimates to be obtained indirectly for a number of joints that are often difficult to detect using traditional object recognition strategies. Comparisons with Kinect sensor results and the state of the art in 2D pose estimation highlight the efficacy of the proposed approach.Comment: 25 pages, Technical report, related to Burke and Lasenby, AMDO 2014 conference paper. Code sample: https://github.com/mgb45/SignerBodyPose Video: https://www.youtube.com/watch?v=dJMTSo7-uF

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

    Face and Gesture Recognition for Human-Robot Interaction

    Get PDF

    Spotting Agreement and Disagreement: A Survey of Nonverbal Audiovisual Cues and Tools

    Get PDF
    While detecting and interpreting temporal patterns of non–verbal behavioral cues in a given context is a natural and often unconscious process for humans, it remains a rather difficult task for computer systems. Nevertheless, it is an important one to achieve if the goal is to realise a naturalistic communication between humans and machines. Machines that are able to sense social attitudes like agreement and disagreement and respond to them in a meaningful way are likely to be welcomed by users due to the more natural, efficient and human–centered interaction they are bound to experience. This paper surveys the nonverbal cues that could be present during agreement and disagreement behavioural displays and lists a number of tools that could be useful in detecting them, as well as a few publicly available databases that could be used to train these tools for analysis of spontaneous, audiovisual instances of agreement and disagreement

    Simultaneous Hand Pose and Skeleton Bone-Lengths Estimation from a Single Depth Image

    Full text link
    Articulated hand pose estimation is a challenging task for human-computer interaction. The state-of-the-art hand pose estimation algorithms work only with one or a few subjects for which they have been calibrated or trained. Particularly, the hybrid methods based on learning followed by model fitting or model based deep learning do not explicitly consider varying hand shapes and sizes. In this work, we introduce a novel hybrid algorithm for estimating the 3D hand pose as well as bone-lengths of the hand skeleton at the same time, from a single depth image. The proposed CNN architecture learns hand pose parameters and scale parameters associated with the bone-lengths simultaneously. Subsequently, a new hybrid forward kinematics layer employs both parameters to estimate 3D joint positions of the hand. For end-to-end training, we combine three public datasets NYU, ICVL and MSRA-2015 in one unified format to achieve large variation in hand shapes and sizes. Among hybrid methods, our method shows improved accuracy over the state-of-the-art on the combined dataset and the ICVL dataset that contain multiple subjects. Also, our algorithm is demonstrated to work well with unseen images.Comment: This paper has been accepted and presented in 3DV-2017 conference held at Qingdao, China. http://irc.cs.sdu.edu.cn/3dv
    corecore