11,250 research outputs found
Visual Rendering of Shapes on 2D Display Devices Guided by Hand Gestures
Designing of touchless user interface is gaining popularity in various
contexts. Using such interfaces, users can interact with electronic devices
even when the hands are dirty or non-conductive. Also, user with partial
physical disability can interact with electronic devices using such systems.
Research in this direction has got major boost because of the emergence of
low-cost sensors such as Leap Motion, Kinect or RealSense devices. In this
paper, we propose a Leap Motion controller-based methodology to facilitate
rendering of 2D and 3D shapes on display devices. The proposed method tracks
finger movements while users perform natural gestures within the field of view
of the sensor. In the next phase, trajectories are analyzed to extract extended
Npen++ features in 3D. These features represent finger movements during the
gestures and they are fed to unidirectional left-to-right Hidden Markov Model
(HMM) for training. A one-to-one mapping between gestures and shapes is
proposed. Finally, shapes corresponding to these gestures are rendered over the
display using MuPad interface. We have created a dataset of 5400 samples
recorded by 10 volunteers. Our dataset contains 18 geometric and 18
non-geometric shapes such as "circle", "rectangle", "flower", "cone", "sphere"
etc. The proposed methodology achieves an accuracy of 92.87% when evaluated
using 5-fold cross validation method. Our experiments revel that the extended
3D features perform better than existing 3D features in the context of shape
representation and classification. The method can be used for developing useful
HCI applications for smart display devices.Comment: Submitted to Elsevier Displays Journal, 32 pages, 18 figures, 7
table
Intelligent Approaches to interact with Machines using Hand Gesture Recognition in Natural way: A Survey
Hand gestures recognition (HGR) is one of the main areas of research for the
engineers, scientists and bioinformatics. HGR is the natural way of Human
Machine interaction and today many researchers in the academia and industry are
working on different application to make interactions more easy, natural and
convenient without wearing any extra device. HGR can be applied from games
control to vision enabled robot control, from virtual reality to smart home
systems. In this paper we are discussing work done in the area of hand gesture
recognition where focus is on the intelligent approaches including soft
computing based methods like artificial neural network, fuzzy logic, genetic
algorithms etc. The methods in the preprocessing of image for segmentation and
hand image construction also taken into study. Most researchers used fingertips
for hand detection in appearance based modeling. Finally the comparison of
results given by different researchers is also presented
Reasoning about Body-Parts Relations for Sign Language Recognition
Over the years, hand gesture recognition has been mostly addressed
considering hand trajectories in isolation. However, in most sign languages,
hand gestures are defined on a particular context (body region). We propose a
pipeline to perform sign language recognition which models hand movements in
the context of other parts of the body captured in the 3D space using the MS
Kinect sensor. In addition, we perform sign recognition based on the different
hand postures that occur during a sign. Our experiments show that considering
different body parts brings improved performance when compared to other methods
which only consider global hand trajectories. Finally, we demonstrate that the
combination of hand postures features with hand gestures features helps to
improve the prediction of a given sign.Comment: Under Review ( 15 Pages: 13 Figures, 6 Tables
A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset
This paper aims to determine which is the best human action recognition
method based on features extracted from RGB-D devices, such as the Microsoft
Kinect. A review of all the papers that make reference to MSR Action3D, the
most used dataset that includes depth information acquired from a RGB-D device,
has been performed. We found that the validation method used by each work
differs from the others. So, a direct comparison among works cannot be made.
However, almost all the works present their results comparing them without
taking into account this issue. Therefore, we present different rankings
according to the methodology used for the validation in orden to clarify the
existing confusion.Comment: 16 pages and 7 table
Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions
3D action recognition has broad applications in human-computer interaction
and intelligent surveillance. However, recognizing similar actions remains
challenging since previous literature fails to capture motion and shape cues
effectively from noisy depth data. In this paper, we propose a novel two-layer
Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and
jointly encodes both motion and shape cues. First, background clutter is
removed by a background modeling method that is designed for depth data. Then,
motion and shape cues are jointly used to generate robust and distinctive
spatial-temporal interest points (STIPs): motion-based STIPs and shape-based
STIPs. In the first layer of our model, a multi-scale 3D local steering kernel
(M3DLSK) descriptor is proposed to describe local appearances of cuboids around
motion-based STIPs. In the second layer, a spatial-temporal vector (STV)
descriptor is proposed to describe the spatial-temporal distributions of
shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape
cues are combined to form a fused action representation. Our model performs
favorably compared with common STIP detection and description methods. Thorough
experiments verify that our model is effective in distinguishing similar
actions and robust to background clutter, partial occlusions and pepper noise
Dynamic Gesture Recognition by Using CNNs and Star RGB: a Temporal Information Condensation
Due to the advance of technologies, machines are increasingly present in
people's daily lives. Thus, there has been more and more effort to develop
interfaces, such as dynamic gestures, that provide an intuitive way of
interaction. Currently, the most common trend is to use multimodal data, as
depth and skeleton information, to enable dynamic gesture recognition. However,
using only color information would be more interesting, since RGB cameras are
usually available in almost every public place, and could be used for gesture
recognition without the need of installing other equipment. The main problem
with such approach is the difficulty of representing spatio-temporal
information using just color. With this in mind, we propose a technique capable
of condensing a dynamic gesture, shown in a video, in just one RGB image. We
call this technique star RGB. This image is then passed to a classifier formed
by two Resnet CNNs, a soft-attention ensemble, and a fully connected layer,
which indicates the class of the gesture present in the input video.
Experiments were carried out using both Montalbano and GRIT datasets. For
Montalbano dataset, the proposed approach achieved an accuracy of 94.58%. Such
result reaches the state-of-the-art when considering this dataset and only
color information. Regarding the GRIT dataset, our proposal achieves more than
98% of accuracy, recall, precision, and F1-score, outperforming the reference
approach by more than 6%.Comment: 19 pages, 12 figures, submitted to Neurocomputing Journa
Hand Gesture Controlled Drones: An Open Source Library
Drones are conventionally controlled using joysticks, remote controllers,
mobile applications, and embedded computers. A few significant issues with
these approaches are that drone control is limited by the range of
electromagnetic radiation and susceptible to interference noise. In this study
we propose the use of hand gestures as a method to control drones. We
investigate the use of computer vision methods to develop an intuitive way of
agent-less communication between a drone and its operator. Computer
vision-based methods rely on the ability of a drone's camera to capture
surrounding images and use pattern recognition to translate images to
meaningful and/or actionable information. The proposed framework involves a few
key parts toward an ultimate action to be taken. They are: image segregation
from the video streams of front camera, creating a robust and reliable image
recognition based on segregated images, and finally conversion of classified
gestures into actionable drone movement, such as takeoff, landing, hovering and
so forth. A set of five gestures are studied in this work. Haar feature-based
AdaBoost classifier is employed for gesture recognition. We also envisage
safety of the operator and drone's action calculating the distance based on
computer vision for this task. A series of experiments are conducted to measure
gesture recognition accuracies considering the major scene variabilities,
illumination, background, and distance. Classification accuracies show that
well-lit, clear background, and within 3 ft gestures are recognized correctly
over 90%. Limitations of current framework and feasible solutions for better
gesture recognition are discussed, too. The software library we developed, and
hand gesture data sets are open-sourced at project website.Comment: ICDIS 201
Hand Action Detection from Ego-centric Depth Sequences with Error-correcting Hough Transform
Detecting hand actions from ego-centric depth sequences is a practically
challenging problem, owing mostly to the complex and dexterous nature of hand
articulations as well as non-stationary camera motion. We address this problem
via a Hough transform based approach coupled with a discriminatively learned
error-correcting component to tackle the well known issue of incorrect votes
from the Hough transform. In this framework, local parts vote collectively for
the start end positions of each action over time. We also construct an
in-house annotated dataset of 300 long videos, containing 3,177 single-action
subsequences over 16 action classes collected from 26 individuals. Our system
is empirically evaluated on this real-life dataset for both the action
recognition and detection tasks, and is shown to produce satisfactory results.
To facilitate reproduction, the new dataset and our implementation are also
provided online
Tracking of Fingertips and Centres of Palm using KINECT
Hand Gesture is a popular way to interact or control machines and it has been
implemented in many applications. The geometry of hand is such that it is hard
to construct in virtual environment and control the joints but the
functionality and DOF encourage researchers to make a hand like instrument.
This paper presents a novel method for fingertips detection and centres of
palms detection distinctly for both hands using MS KINECT in 3D from the input
image. KINECT facilitates us by providing the depth information of foreground
objects. The hands were segmented using the depth vector and centres of palms
were detected using distance transformation on inverse image. This result would
be used to feed the inputs to the robotic hands to emulate human hands
operation.Comment: 4 pag
Augmented reality meeting table: a novel multi-user interface for architectural design
Immersive virtual environments have received widespread attention as providing possible replacements for the media and systems that designers traditionally use, as well as, more generally, in providing support for collaborative work. Relatively little attention has been given to date however to the problem of how to merge immersive virtual environments into real world work settings, and so to add to the media at the disposal of the designer and the design team, rather than to replace it. In this paper we report on a research project in which optical see-through augmented reality displays have been developed together with prototype decision support software for architectural and urban design. We suggest that a critical characteristic of multi user augmented reality is its ability to generate visualisations from a first person perspective in which the scale of rendition of the design model follows many of the conventions that designers are used to. Different scales of model appear to allow designers to focus on different aspects of the design under consideration. Augmenting the scene with simulations of pedestrian movement appears to assist both in scale recognition, and in moving from a first person to a third person understanding of the design. This research project is funded by the European Commission IST program (IST-2000-28559)
- …