86,519 research outputs found
Review of constraints on vision-based gesture recognition for human–computer interaction
The ability of computers to recognise hand gestures visually is essential for progress in human-computer interaction. Gesture recognition has applications ranging from sign language to medical assistance to virtual reality. However, gesture recognition is extremely challenging not only because of its diverse contexts, multiple interpretations, and spatio-temporal variations but also because of the complex non-rigid properties of the hand. This study surveys major constraints on vision-based gesture recognition occurring in detection and pre-processing, representation and feature extraction, and recognition. Current challenges are explored in detail
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
Recent progress in fine-grained gesture and action classification, and
machine translation, point to the possibility of automated sign language
recognition becoming a reality. A key stumbling block in making progress
towards this goal is a lack of appropriate training data, stemming from the
high complexity of sign annotation and a limited supply of qualified
annotators. In this work, we introduce a new scalable approach to data
collection for sign recognition in continuous videos. We make use of
weakly-aligned subtitles for broadcast footage together with a keyword spotting
method to automatically localise sign-instances for a vocabulary of 1,000 signs
in 1,000 hours of video. We make the following contributions: (1) We show how
to use mouthing cues from signers to obtain high-quality annotations from video
data - the result is the BSL-1K dataset, a collection of British Sign Language
(BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train
strong sign recognition models for co-articulated signs in BSL and that these
models additionally form excellent pretraining for other sign languages and
benchmarks - we exceed the state of the art on both the MSASL and WLASL
benchmarks. Finally, (3) we propose new large-scale evaluation sets for the
tasks of sign recognition and sign spotting and provide baselines which we hope
will serve to stimulate research in this area.Comment: Appears in: European Conference on Computer Vision 2020 (ECCV 2020).
28 page
K-RSL: a Corpus for Linguistic Understanding, Visual Evaluation, and Recognition of Sign Languages
The paper presents the first dataset that aims to serve interdisciplinary purposes for the utility of computer vision community and sign language linguistics. To date, a majority of Sign Language Recognition (SLR) approaches focus on recognising sign language as a manual gesture recognition problem. However, signers use other articulators: facial expressions, head and body position and movement to convey linguistic information. Given the important role of non-manual markers, this paper proposes a dataset and presents a use case to stress the importance of including non-manual features to improve the recognition accuracy of signs. To the best of our knowledge no prior publicly available dataset exists that explicitly focuses on non-manual components responsible for the grammar of sign languages. To this end, the proposed dataset contains 28250 videos of signs of high resolution and quality, with annotation of manual and nonmanual components. We conducted a series of evaluations in order to investigate whether non-manual components would improve signs’ recognition accuracy. We release the dataset to encourage SLR researchers and help advance current progress in this area toward realtime sign language interpretation. Our dataset will be made publicly available at https:// krslproject.github.io/krsl-corpuspublishedVersio
Word level Bangla Sign Language Dataset for Continuous BSL Recognition
An robust sign language recognition system can greatly alleviate
communication barriers, particularly for people who struggle with verbal
communication. This is crucial for human growth and progress as it enables the
expression of thoughts, feelings, and ideas. However, sign recognition is a
complex task that faces numerous challenges such as same gesture patterns for
multiple signs, lighting, clothing, carrying conditions, and the presence of
large poses, as well as illumination discrepancies across different views.
Additionally, the absence of an extensive Bangla sign language video dataset
makes it even more challenging to operate recognition systems, particularly
when utilizing deep learning techniques. In order to address this issue,
firstly, we created a large-scale dataset called the MVBSL-W50, which comprises
50 isolated words across 13 categories. Secondly, we developed an
attention-based Bi-GRU model that captures the temporal dynamics of pose
information for individuals communicating through sign language. The proposed
model utilizes human pose information, which has shown to be successful in
analyzing sign language patterns. By focusing solely on movement information
and disregarding body appearance and environmental factors, the model is
simplified and can achieve a speedier performance. The accuracy of the model is
reported to be 85.64%
American Sign Language Recognition Using Machine Learning and Computer Vision
Speech impairment is a disability which affects an individual’s ability to communicate using speech and hearing. People who are affected by this use other media of communication such as sign language. Although sign language is ubiquitous in recent times, there remains a challenge for non-sign language speakers to communicate with sign language speakers or signers. With recent advances in deep learning and computer vision there has been promising progress in the fields of motion and gesture recognition using deep learning and computer vision-based techniques. The focus of this work is to create a vision-based application which offers sign language translation to text thus aiding communication between signers and non-signers. The proposed model takes video sequences and extracts temporal and spatial features from them. We then use Inception, a CNN (Convolutional Neural Network) for recognizing spatial features. We then use an RNN (Recurrent Neural Network) to train on temporal features. The dataset used is the American Sign Language Dataset
- …