33,140 research outputs found
A system for person-independent hand posture recognition against complex backgrounds
A computer vision system for non-independent recognition of hand postures against complex background is presented. The system is based on Elastic Graph Matching (EGM), which was extended to allow for combinations of different feature types at the graph nodes
A new 2D static hand gesture colour image dataset for ASL gestures
It usually takes a fusion of image processing and machine learning algorithms in order to
build a fully-functioning computer vision system for hand gesture recognition. Fortunately,
the complexity of developing such a system could be alleviated by treating the system as a
collection of multiple sub-systems working together, in such a way that they can be dealt
with in isolation. Machine learning need to feed on thousands of exemplars (e.g. images,
features) to automatically establish some recognisable patterns for all possible classes (e.g.
hand gestures) that applies to the problem domain. A good number of exemplars helps, but
it is also important to note that the efficacy of these exemplars depends on the variability
of illumination conditions, hand postures, angles of rotation, scaling and on the number of
volunteers from whom the hand gesture images were taken. These exemplars are usually
subjected to image processing first, to reduce the presence of noise and extract the important
features from the images. These features serve as inputs to the machine learning system.
Different sub-systems are integrated together to form a complete computer vision system for
gesture recognition. The main contribution of this work is on the production of the exemplars.
We discuss how a dataset of standard American Sign Language (ASL) hand gestures containing
2425 images from 5 individuals, with variations in lighting conditions and hand postures is
generated with the aid of image processing techniques. A minor contribution is given in
the form of a specific feature extraction method called moment invariants, for which the
computation method and the values are furnished with the dataset
A new framework for sign language alphabet hand posture recognition using geometrical features through artificial neural network (part 1)
Hand pose tracking is essential in sign languages. An automatic recognition of performed hand signs facilitates a number of applications, especially for people with speech impairment to communication with normal people. This framework which is called ASLNN proposes a new hand posture recognition technique for the American sign language alphabet based on the neural network which works on the geometrical feature extraction of hands. A user’s hand is captured by a three-dimensional depth-based sensor camera; consequently, the hand is segmented according to the depth analysis features. The proposed system is called depth-based geometrical sign language recognition as named DGSLR. The DGSLR adopted in easier hand segmentation approach, which is further used in segmentation applications. The proposed geometrical feature extraction framework improves the accuracy of recognition due to unchangeable features against hand orientation compared to discrete cosine transform and moment invariant. The findings of the iterations demonstrate the combination of the extracted features resulted to improved accuracy rates. Then, an artificial neural network is used to drive desired outcomes. ASLNN is proficient to hand posture recognition and provides accuracy up to 96.78% which will be discussed on the additional paper of this authors in this journal
Computational Intelligence Techniques in Visual Pattern Recognition
Ph.DDOCTOR OF PHILOSOPH
Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
We describe the first method to automatically estimate the 3D pose of the
human body as well as its 3D shape from a single unconstrained image. We
estimate a full 3D mesh and show that 2D joints alone carry a surprising amount
of information about body shape. The problem is challenging because of the
complexity of the human body, articulation, occlusion, clothing, lighting, and
the inherent ambiguity in inferring 3D from 2D. To solve this, we first use a
recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D
body joint locations. We then fit (top-down) a recently published statistical
body shape model, called SMPL, to the 2D joints. We do so by minimizing an
objective function that penalizes the error between the projected 3D model
joints and detected 2D joints. Because SMPL captures correlations in human
shape across the population, we are able to robustly fit it to very little
data. We further leverage the 3D model to prevent solutions that cause
interpenetration. We evaluate our method, SMPLify, on the Leeds Sports,
HumanEva, and Human3.6M datasets, showing superior pose accuracy with respect
to the state of the art.Comment: To appear in ECCV 201
Recommended from our members
Recognizing human activity using RGBD data
textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin
Computational Models for the Automatic Learning and Recognition of Irish Sign Language
This thesis presents a framework for the automatic recognition of Sign Language
sentences. In previous sign language recognition works, the issues of;
user independent recognition, movement epenthesis modeling and automatic
or weakly supervised training have not been fully addressed in a single recognition
framework. This work presents three main contributions in order to
address these issues.
The first contribution is a technique for user independent hand posture
recognition. We present a novel eigenspace Size Function feature which is
implemented to perform user independent recognition of sign language hand
postures.
The second contribution is a framework for the classification and spotting
of spatiotemporal gestures which appear in sign language. We propose a
Gesture Threshold Hidden Markov Model (GT-HMM) to classify gestures
and to identify movement epenthesis without the need for explicit epenthesis
training.
The third contribution is a framework to train the hand posture and spatiotemporal
models using only the weak supervision of sign language videos
and their corresponding text translations. This is achieved through our proposed
Multiple Instance Learning Density Matrix algorithm which automatically
extracts isolated signs from full sentences using the weak and noisy
supervision of text translations. The automatically extracted isolated samples
are then utilised to train our spatiotemporal gesture and hand posture
classifiers.
The work we present in this thesis is an important and significant contribution
to the area of natural sign language recognition as we propose a
robust framework for training a recognition system without the need for
manual labeling
- …