363 research outputs found
End-to-End Multiview Gesture Recognition for Autonomous Car Parking System
The use of hand gestures can be the most intuitive human-machine interaction medium.
The early approaches for hand gesture recognition used device-based methods. These
methods use mechanical or optical sensors attached to a glove or markers, which hinders
the natural human-machine communication. On the other hand, vision-based methods are
not restrictive and allow for a more spontaneous communication without the need of an
intermediary between human and machine. Therefore, vision gesture recognition has been
a popular area of research for the past thirty years.
Hand gesture recognition finds its application in many areas, particularly the automotive
industry where advanced automotive human-machine interface (HMI) designers are
using gesture recognition to improve driver and vehicle safety. However, technology advances
go beyond active/passive safety and into convenience and comfort. In this context,
one of America’s big three automakers has partnered with the Centre of Pattern Analysis
and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding
their product segment through machine learning to provide an increased driver convenience
and comfort with the particular application of hand gesture recognition for autonomous
car parking.
In this thesis, we leverage the state-of-the-art deep learning and optimization techniques
to develop a vision-based multiview dynamic hand gesture recognizer for self-parking system.
We propose a 3DCNN gesture model architecture that we train on a publicly available
hand gesture database. We apply transfer learning methods to fine-tune the pre-trained
gesture model on a custom-made data, which significantly improved the proposed system
performance in real world environment. We adapt the architecture of the end-to-end solution
to expand the state of the art video classifier from a single image as input (fed by
monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we
optimize the proposed solution to work on a limited resources embedded platform (Nvidia
Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the
accuracy robustness and real time functionality of the system
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
Robust density modelling using the student's t-distribution for human action recognition
The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE
Fusion techniques for activity recognition using multi-camera networks
Real-time automatic activity recognition is an important area of research in the field of Computer Vision with plenty of applications in surveillance, gaming, entertainment and automobile safety. Because of advances in wireless networks and camera technologies, distributed camera networks are becoming more prominent. Distributed camera networks offer complimentary views of scenes and hence are better suited for real-time surveillance applications. They are robust to camera failures and in-complete field of views.;In a camera network, fusing information from multiple cameras is an important problem, especially when one doesn\u27t have knowledge of subjects orientation with respect to the camera and when arrangement of cameras is not symmetric. The objective of this dissertation is to design a information fusion technique for camera networks and to apply them in the contenxt of surveillance and safety applications (in coal-mines). (Abstract shortened by ProQuest.)
Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description
In this study, a new multi-view human action recognition approach is proposed by exploiting low-dimensional motion information of actions. Before feature extraction, pre-processing steps are performed to remove noise from silhouettes, incurred due to imperfect, but realistic segmentation. Two-dimensional motion templates based on motion history image (MHI) are computed for each view/action video. Histograms of oriented gradients (HOGs) are used as an efficient description of the MHIs which are classified using nearest neighbor (NN) classifier. As compared with existing approaches, the proposed method has three advantages: (i) does not require a fixed number of cameras setup during training and testing stages hence missing camera-views can be tolerated, (ii) requires less memory and bandwidth requirements and hence (iii) is computationally efficient which makes it suitable for real-time action recognition. As far as the authors know, this is the first report of results on the MuHAVi-uncut dataset having a large number of action categories and a large set of camera-views with noisy silhouettes which can be used by future workers as a baseline to improve on. Experimentation results on multi-view with this dataset gives a high-accuracy rate of 95.4% using leave-one-sequence-out cross-validation technique and compares well to similar state-of-the-art approachesSergio A Velastin acknowledges the Chilean National Science and Technology Council (CONICYT) for its funding under grant CONICYT-Fondecyt Regular no. 1140209 (“OBSERVE”). He is currently funded by the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement nÂş 600371, el Ministerio de EconomĂa y Competitividad (COFUND2013-51509) and Banco Santander
- …