Search CORE

27,153 research outputs found

Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion

Author: Liu Hong
Sebe Nicu
Tang Hao
Xiao Wei
Publication venue
Publication date: 14/01/2019
Field of study

Gesture recognition is a hot topic in computer vision and pattern recognition, which plays a vitally important role in natural human-computer interface. Although great progress has been made recently, fast and robust hand gesture recognition remains an open problem, since the existing methods have not well balanced the performance and the efficiency simultaneously. To bridge it, this work combines image entropy and density clustering to exploit the key frames from hand gesture video for further feature extraction, which can improve the efficiency of recognition. Moreover, a feature fusion strategy is also proposed to further improve feature representation, which elevates the performance of recognition. To validate our approach in a "wild" environment, we also introduce two new datasets called HandGesture and Action3D datasets. Experiments consistently demonstrate that our strategy achieves competitive results on Northwestern University, Cambridge, HandGesture and Action3D hand gesture datasets. Our code and datasets will release at https://github.com/Ha0Tang/HandGestureRecognition.Comment: 11 pages, 3 figures, accepted to NeuroComputin

arXiv.org e-Print Archive

Robust and customized methods for real-time hand gesture recognition under object-occlusion

Author: Ban Xiaojuan
Han Zhishuai
Wang Xiaokun
wu Di
Publication venue
Publication date: 16/09/2018
Field of study

Dynamic hand tracking and gesture recognition is a hard task since there are many joints on the fingers and each joint owns many degrees of freedom. Besides, object occlusion is also a thorny issue in finger tracking and posture recognition. Therefore, we propose a robust and customized system for realtime hand tracking and gesture recognition under occlusion environment. First, we model the angles between hand keypoints and encode their relative coordinate vectors, then we introduce GAN to generate raw discrete sequence dataset. Secondly we propose a time series forecasting method in the prediction of defined hand keypoint location. Finally, we define a sliding window matching method to complete gesture recognition. We analyze 11 kinds of typical gestures and show how to perform gesture recognition with the proposed method. Our work can reach state of the art results and contribute to build a framework to implement customized gesture recognition task

arXiv.org e-Print Archive

Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

Author: Chen Xinghao
Guo Hengkai
Wang Guijin
Zhang Cairong
Publication venue: 'Elsevier BV'
Publication date: 23/06/2018
Field of study

Hand pose estimation from a single depth image is an essential topic in computer vision and human computer interaction. Despite recent advancements in this area promoted by convolutional neural network, accurate hand pose estimation is still a challenging problem. In this paper we propose a Pose guided structured Region Ensemble Network (Pose-REN) to boost the performance of hand pose estimation. The proposed method extracts regions from the feature maps of convolutional neural network under the guide of an initially estimated pose, generating more optimal and representative features for hand pose estimation. The extracted feature regions are then integrated hierarchically according to the topology of hand joints by employing tree-structured fully connections. A refined estimation of hand pose is directly regressed by the proposed network and the final hand pose is obtained by utilizing an iterative cascaded method. Comprehensive experiments on public hand pose datasets demonstrate that our proposed method outperforms state-of-the-art algorithms.Comment: Accepted by Neurocomputin

arXiv.org e-Print Archive

Fingertip Detection and Tracking for Recognition of Air-Writing in Videos

Author: Ahmed Arif
Dogra Debi Prosad
Kar Samarjit
Mukherjee Sohom
Roy Partha Pratim
Publication venue: 'Elsevier BV'
Publication date: 09/09/2018
Field of study

Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any hand-held device. In this work, we address the problem of mid-air finger writing using web-cam video as input. In spite of recent advances in object detection and tracking, accurate and robust detection and tracking of the fingertip remains a challenging task, primarily due to small dimension of the fingertip. Moreover, the initialization and termination of mid-air finger writing is also challenging due to the absence of any standard delimiting criterion. To solve these problems, we propose a new writing hand pose detection algorithm for initialization of air-writing using the Faster R-CNN framework for accurate hand detection followed by hand segmentation and finally counting the number of raised fingers based on geometrical properties of the hand. Further, we propose a robust fingertip detection and tracking approach using a new signature function called distance-weighted curvature entropy. Finally, a fingertip velocity-based termination criterion is used as a delimiter to mark the completion of the air-writing gesture. Experiments show the superiority of the proposed fingertip detection and tracking algorithm over state-of-the-art approaches giving a mean precision of 73.1 % while achieving real-time performance at 18.5 fps, a condition which is of vital importance to air-writing. Character recognition experiments give a mean accuracy of 96.11 % using the proposed air-writing system, a result which is comparable to that of existing handwritten character recognition systems.Comment: 32 pages, 10 figures, 2 tables. Submitted to Journal of Expert Systems with Application

arXiv.org e-Print Archive

Deep Facial Expression Recognition: A Survey

Author: Deng Weihong
Li Shan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/10/2018
Field of study

With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems

arXiv.org e-Print Archive

Online Action Recognition based on Incremental Learning of Weighted Covariance Descriptors

Author: Li Wanqing
Tang Chang
Wang Pichao
Publication venue
Publication date: 06/07/2017
Field of study

Different from traditional action recognition based on video segments, online action recognition aims to recognize actions from unsegmented streams of data in a continuous manner. One way for online recognition is based on the evidence accumulation over time to make predictions from stream videos. This paper presents a fast yet effective method to recognize actions from stream of noisy skeleton data, and a novel weighted covariance descriptor is adopted to accumulate evidence. In particular, a fast incremental updating method for the weighted covariance descriptor is developed for accumulation of temporal information and online prediction. The weighted covariance descriptor takes the following principles into consideration: past frames have less contribution for recognition and recent and informative frames such as key frames contribute more to the recognition. The online recognition is achieved using a simple nearest neighbor search against a set of offline trained action models. Experimental results on MSC-12 Kinect Gesture dataset and our newly constructed online action recognition dataset have demonstrated the efficacy of the proposed method

arXiv.org e-Print Archive

Evaluation of the Spatio-Temporal features and GAN for Micro-expression Recognition System

Author: Gan Y. S.
Lic Shu-Meng
Liong Sze-Teng
Liu Kun-Hong
Lyu Ran-Ke
Xua Hao-Xuan
Zhang Han-Zhe
Zheng Danna
Publication venue
Publication date: 02/04/2019
Field of study

Owing to the development and advancement of artificial intelligence, numerous works were established in the human facial expression recognition system. Meanwhile, the detection and classification of micro-expressions are attracting attentions from various research communities in the recent few years. In this paper, we first review the processes of a conventional optical-flow-based recognition system, which comprised of facial landmarks annotations, optical flow guided images computation, features extraction and emotion class categorization. Secondly, a few approaches have been proposed to improve the feature extraction part, such as exploiting GAN to generate more image samples. Particularly, several variations of optical flow are computed in order to generate optimal images to lead to high recognition accuracy. Next, GAN, a combination of Generator and Discriminator, is utilized to generate new "fake" images to increase the sample size. Thirdly, a modified state-of-the-art Convolutional neural networks is proposed. To verify the effectiveness of the the proposed method, the results are evaluated on spontaneous micro-expression databases, namely SMIC, CASME II and SAMM. Both the F1-score and accuracy performance metrics are reported in this paper.Comment: 15 pages, 16 figures, 6 table

arXiv.org e-Print Archive

American Sign Language fingerspelling recognition in the wild

Author: Brentari Diane
Del Rio Aurora Martinez
Keane Jonathan
Livescu Karen
Michaux Jonathan
Shakhnarovich Greg
Shi Bowen
Publication venue
Publication date: 17/02/2019
Field of study

We address the problem of American Sign Language fingerspelling recognition in the wild, using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike prior work, our video data is extremely challenging due to low frame rates and visual variability. To tackle the visual challenges, we train a special-purpose signing hand detector using a small subset of our data. Given the hand detector output, a sequence model decodes the hypothesized fingerspelled letter sequence. For the sequence model, we explore attention-based recurrent encoder-decoders and CTC-based approaches. As the first attempt at fingerspelling recognition in the wild, this work is intended to serve as a baseline for future work on sign language recognition in realistic conditions. We find that, as expected, letter error rates are much higher than in previous work on more controlled data, and we analyze the sources of error and effects of model variants.Comment: accepted in SLT 201

arXiv.org e-Print Archive

UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition

Author: Chahl Javaan
Law Yee Wei
Perera Asanka G
Publication venue
Publication date: 08/01/2019
Field of study

Current UAV-recorded datasets are mostly limited to action recognition and object tracking, whereas the gesture signals datasets were mostly recorded in indoor spaces. Currently, there is no outdoor recorded public video dataset for UAV commanding signals. Gesture signals can be effectively used with UAVs by leveraging the UAVs visual sensors and operational simplicity. To fill this gap and enable research in wider application areas, we present a UAV gesture signals dataset recorded in an outdoor setting. We selected 13 gestures suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. We provide 119 high-definition video clips consisting of 37151 frames. The overall baseline gesture recognition performance computed using Pose-based Convolutional Neural Network (P-CNN) is 91.9 %. All the frames are annotated with body joints and gesture classes in order to extend the dataset's applicability to a wider research area including gesture recognition, action recognition, human pose recognition and situation awareness.Comment: 12 pages, 4 figures, UAVision workshop, ECCV, 201

arXiv.org e-Print Archive

Audio to Body Dynamics

Author: Dery Lucio M.
Kemelmacher-Shlizerman Ira
Schoen Hayden
Shlizerman Eli
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/12/2017
Field of study

We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Aiming for a fully detailed correct arms and fingers motion is a goal, however, it's not clear if body movement can be predicted from music at all. In this paper, we present the first result that shows that natural body dynamics can be predicted at all. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation.Comment: Link with videos https://arviolin.github.io/AudioBodyDynamics

arXiv.org e-Print Archive