Search CORE

33,140 research outputs found

A system for person-independent hand posture recognition against complex backgrounds

Author: Malsburg Christoph von der
Triesch Jochen
Publication venue
Publication date: 07/11/2005
Field of study

A computer vision system for non-independent recognition of hand postures against complex background is presented. The system is based on Elastic Graph Matching (EGM), which was extended to allow for combinations of different feature types at the graph nodes

Hochschulschriftenserver - Universität Frankfurt am Main

A new 2D static hand gesture colour image dataset for ASL gestures

Author: Abastillas M.
Barczak A.L.C.
Piccio A.
Reyes N.H.
Susnjak T.
Publication venue: 'Massey University'
Publication date: 01/01/2011
Field of study

It usually takes a fusion of image processing and machine learning algorithms in order to build a fully-functioning computer vision system for hand gesture recognition. Fortunately, the complexity of developing such a system could be alleviated by treating the system as a collection of multiple sub-systems working together, in such a way that they can be dealt with in isolation. Machine learning need to feed on thousands of exemplars (e.g. images, features) to automatically establish some recognisable patterns for all possible classes (e.g. hand gestures) that applies to the problem domain. A good number of exemplars helps, but it is also important to note that the efficacy of these exemplars depends on the variability of illumination conditions, hand postures, angles of rotation, scaling and on the number of volunteers from whom the hand gesture images were taken. These exemplars are usually subjected to image processing first, to reduce the presence of noise and extract the important features from the images. These features serve as inputs to the machine learning system. Different sub-systems are integrated together to form a complete computer vision system for gesture recognition. The main contribution of this work is on the production of the exemplars. We discuss how a dataset of standard American Sign Language (ASL) hand gestures containing 2425 images from 5 individuals, with variations in lighting conditions and hand postures is generated with the aid of image processing techniques. A minor contribution is given in the form of a specific feature extraction method called moment invariants, for which the computation method and the values are furnished with the dataset

Massey Research Online

A new framework for sign language alphabet hand posture recognition using geometrical features through artificial neural network (part 1)

Author: A Anand
CC Chang
CW Hsu
D Janez
G Aquino
G Fang
GC Cawley
H Han
H Liang
H-S Yeo
I Elias
IV Tetko
J De Jesús Rubio
K Assaleh
K Crammer
K-J Yoon
N Bahman
N Duta
P Garg
P Kauff
P Kishore
PVV Kishore
Q-S Zhu
RKE Yo
RR Sanchez
S Shamshirband
SS Rautaray
T Fawcett
TG Dietterich
W Xiong
X Wu
Z Luca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Hand pose tracking is essential in sign languages. An automatic recognition of performed hand signs facilitates a number of applications, especially for people with speech impairment to communication with normal people. This framework which is called ASLNN proposes a new hand posture recognition technique for the American sign language alphabet based on the neural network which works on the geometrical feature extraction of hands. A user’s hand is captured by a three-dimensional depth-based sensor camera; consequently, the hand is segmented according to the depth analysis features. The proposed system is called depth-based geometrical sign language recognition as named DGSLR. The DGSLR adopted in easier hand segmentation approach, which is further used in segmentation applications. The proposed geometrical feature extraction framework improves the accuracy of recognition due to unchangeable features against hand orientation compared to discrete cosine transform and moment invariant. The findings of the iterations demonstrate the combination of the extracted features resulted to improved accuracy rates. Then, an artificial neural network is used to drive desired outcomes. ASLNN is proficient to hand posture recognition and provides accuracy up to 96.78% which will be discussed on the additional paper of this authors in this journal

LJMU Research Online (Liverpool John Moores University)

Crossref

Universiti Teknologi Malaysia Institutional Repository

Computational Intelligence Techniques in Visual Pattern Recognition

Author: PRAMOD KUMAR PISHARADY
Publication venue
Publication date: 18/08/2011
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Author: A Jain
A Jain
C Barron
C Ionescu
C Rother
C Taylor
D Anguelov
E Olson
F Zhou
G Pons-Moll
H Lee
J Nocedal
JM Thiery
L Bo
L Sigal
M Kiefel
M Loper
M Loper
MM Loper
S Geman
S Li
S Zhou
T Pfister
V Ramakrishna
X Fan
Y Chen
Publication venue
Publication date: 27/07/2016
Field of study

We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occlusion, clothing, lighting, and the inherent ambiguity in inferring 3D from 2D. To solve this, we first use a recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D body joint locations. We then fit (top-down) a recently published statistical body shape model, called SMPL, to the 2D joints. We do so by minimizing an objective function that penalizes the error between the projected 3D model joints and detected 2D joints. Because SMPL captures correlations in human shape across the population, we are able to robustly fit it to very little data. We further leverage the 3D model to prevent solutions that cause interpenetration. We evaluate our method, SMPLify, on the Leeds Sports, HumanEva, and Human3.6M datasets, showing superior pose accuracy with respect to the state of the art.Comment: To appear in ECCV 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Recommended from our members

Recognizing human activity using RGBD data

Author: Xia Lu, active 21st century
Publication venue
Publication date: 03/07/2014
Field of study

textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin

Texas ScholarWorks

Computational Models for the Automatic Learning and Recognition of Irish Sign Language

Author: Kelly Daniel
Publication venue
Publication date: 01/11/2010
Field of study

This thesis presents a framework for the automatic recognition of Sign Language sentences. In previous sign language recognition works, the issues of; user independent recognition, movement epenthesis modeling and automatic or weakly supervised training have not been fully addressed in a single recognition framework. This work presents three main contributions in order to address these issues. The first contribution is a technique for user independent hand posture recognition. We present a novel eigenspace Size Function feature which is implemented to perform user independent recognition of sign language hand postures. The second contribution is a framework for the classification and spotting of spatiotemporal gestures which appear in sign language. We propose a Gesture Threshold Hidden Markov Model (GT-HMM) to classify gestures and to identify movement epenthesis without the need for explicit epenthesis training. The third contribution is a framework to train the hand posture and spatiotemporal models using only the weak supervision of sign language videos and their corresponding text translations. This is achieved through our proposed Multiple Instance Learning Density Matrix algorithm which automatically extracts isolated signs from full sentences using the weak and noisy supervision of text translations. The automatically extracted isolated samples are then utilised to train our spatiotemporal gesture and hand posture classifiers. The work we present in this thesis is an important and significant contribution to the area of natural sign language recognition as we propose a robust framework for training a recognition system without the need for manual labeling

MURAL - Maynooth University Research Archive Library