210,984 research outputs found
LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition Network for Embedded AR Devices
Online hand gesture recognition (HGR) techniques are essential in augmented
reality (AR) applications for enabling natural human-to-computer interaction
and communication. In recent years, the consumer market for low-cost AR devices
has been rapidly growing, while the technology maturity in this domain is still
limited. Those devices are typical of low prices, limited memory, and
resource-constrained computational units, which makes online HGR a challenging
problem. To tackle this problem, we propose a lightweight and computationally
efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition
on embedded devices with low computing power. We also show that the proposed
method is of high accuracy and robustness, which is able to reach high-end
performance in a variety of complicated interaction environments. To achieve
our goal, we first propose a cascaded multi-task convolutional neural network
(CNN) to simultaneously predict probabilities of hand detection and regress
hand keypoint locations online. We show that, with the proposed cascaded
architecture design, false-positive estimates can be largely eliminated.
Additionally, an associated mapping approach is introduced to track the hand
trace via the predicted locations, which addresses the interference of
multi-handedness. Subsequently, we propose a trace sequence neural network
(TraceSeqNN) to recognize the hand gesture by exploiting the motion features of
the tracked trace. Finally, we provide a variety of experimental results to
show that the proposed framework is able to achieve state-of-the-art accuracy
with significantly reduced computational cost, which are the key properties for
enabling real-time applications in low-cost commercial devices such as mobile
devices and AR/VR headsets.Comment: Published in: 2019 IEEE International Symposium on Mixed and
Augmented Reality Adjunct (ISMAR-Adjunct
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
A real-time human-robot interaction system based on gestures for assistive scenarios
Natural and intuitive human interaction with robotic systems is a key point to develop robots assisting people in an easy and effective way. In this paper, a Human Robot Interaction (HRI) system able to recognize gestures usually employed in human non-verbal communication is introduced, and an in-depth study of its usability is performed. The system deals with dynamic gestures such as waving or nodding which are recognized using a Dynamic Time Warping approach based on gesture specific features computed from depth maps. A static gesture consisting in pointing at an object is also recognized. The pointed location is then estimated in order to detect candidate objects the user may refer to. When the pointed object is unclear for the robot, a disambiguation procedure by means of either a verbal or gestural dialogue is performed. This skill would lead to the robot picking an object in behalf of the user, which could present difficulties to do it by itself. The overall system — which is composed by a NAO and Wifibot robots, a KinectTM v2 sensor and two laptops — is firstly evaluated in a structured lab setup. Then, a broad set of user tests has been completed, which allows to assess correct performance in terms of recognition rates, easiness of use and response times.Postprint (author's final draft
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
- …