40 research outputs found
A simple technique for improving multi-class classification with neural networks
We present a novel method to perform multi-class pattern classification with
neural networks and test it on a challenging 3D hand gesture recognition
problem. Our method consists of a standard one-against-all (OAA)
classification, followed by another network layer classifying the resulting
class scores, possibly augmented by the original raw input vector. This allows
the network to disambiguate hard-to-separate classes as the distribution of
class scores carries considerable information as well, and is in fact often
used for assessing the confidence of a decision. We show that by this approach
we are able to significantly boost our results, overall as well as for
particular difficult cases, on the hard 10-class gesture classification task.Comment: European Symposium on artificial neural networks (ESANN), Jun 2015,
Bruges, Belgiu
A pragmatic approach to multi-class classification
We present a novel hierarchical approach to multi-class classification which
is generic in that it can be applied to different classification models (e.g.,
support vector machines, perceptrons), and makes no explicit assumptions about
the probabilistic structure of the problem as it is usually done in multi-class
classification. By adding a cascade of additional classifiers, each of which
receives the previous classifier's output in addition to regular input data,
the approach harnesses unused information that manifests itself in the form of,
e.g., correlations between predicted classes. Using multilayer perceptrons as a
classification model, we demonstrate the validity of this approach by testing
it on a complex ten-class 3D gesture recognition task.Comment: European Symposium on artificial neural networks (ESANN), Apr 2015,
Bruges, Belgium. 201
A simple technique for improving multi-class classification with neural networks
International audienceWe present a novel method to perform multi-class pattern classification with neural networks and test it on a challenging 3D hand gesture recognition problem. Our method consists of a standard one-against-all (OAA) classification, followed by another network layer classifying the resulting class scores, possibly augmented by the original raw input vector. This allows the network to disambiguate hard-to-separate classes as the distribution of class scores carries considerable information as well, and is in fact often used for assessing the confidence of a decision. We show that by this approach we are able to significantly boost our results , overall as well as for particular difficult cases, on the hard 10-class gesture classification task
A Deep Learning Approach for Hand Posture Recognition from Depth Data
International audienceGiven the success of convolutional neural networks (CNNs) during recent years in numerous object recognition tasks, it seems logical to further extend their applicability to the treatment of three-dimensional data such as point clouds provided by depth sensors. To this end, we present an approach exploiting the CNN's ability of automated feature generation and combine it with a novel 3D feature computation technique , preserving local information contained in the data. Experiments are conducted on a large data set of 600.000 samples of hand postures obtained via ToF (time-of-flight) sensors from 20 different persons, after an extensive parameter search in order to optimize network structure. Generalization performance, measured by a leave-one-person-out scheme, exceeds that of any other method presented for this specific task, bringing the error for some persons down to 1.5%
Neural Learning Methods for Human-Computer Interaction
Cette thèse a pour but d'améliorer la tâche de reconnaître des gestes de main en utilisant des techniques d'apprentissage par ordinateur et de traitement du signal. Les principales contributions de la thèse sont dédiés à la théorie de l'apprentissage par ordinateur et à l'interaction homme-machine. L'objectif étant d'implanter toutes méthodes en temps réel, toute méthode employé au cours de cette thèse était un compromis entre puissance et temps de calcul nécessaire.Plusieurs pistes ont été poursuivi : au début, la fusion des informations fournies par plusieurs capteurs tu type « time-of-flight » a été étudiée, dans le but d'améliorer le taux de reconnaissances correctes par rapport au cas avec un seul capteur. En particulier, l'impact des différentes caractéristiques calculés à partir d'une nuage de points, et de ses paramètres, a été évalué. Egalement, la performance des réseaux multi-couches (MLP) à été comparé avec celle d'un séparateur à vaste marge (SVM).En s'appuyant sur ces résultats, l'implantation du système dans une voiture a eté effectuée. Tout d'abord, nous avons montré que le système n'est pas du tout gêné par le fait d'être exposé aux conditions d'éclairage « outdoor ». L'extension de la base d'entraînement et une modification des caractéristiques calculé de la nuage des points a pu augmenter le taux de bonnes reconnaissances de façon très significative, ainsi que le rajout des mesures de confiance à la classification.Afin d'améliorer la performance des classifieurs à la base des réseaux multi-couche (MLP), une nouvelle méthode assez simple a été mise au point ensuite. Cette méthode met au profit des informations déjà présentes dans la dernière couche du réseau. En combinant cette nouvelle approche avec une technique de fusion, le taux de bonnes reconnaissances est amélioré, et surtout pour le cas des échantillons « difficiles ». Ces résultats ont été analysés et comparés de façon approfondie en comparant des différentes possibilités de fusion dans un tel contexte. L'exploitation du fait que les données traitées dont des séquences, et qu'il y a par conséquent une cohérence temporelle dans des échantillons successifs, a également été abordée un utilisant les mêmes techniques de fusion. Un système de « infotainment » implanté sur un smartphone, qui utilise les techniques décrites ici, a également été réalisé.Dans un dernier temps, un modèle simplifié de la reconnaissance des gestes dynamiques a été proposé et validé dans un contexte applicatif. Il a été montré que un geste peut être défini de façon assez robuste par une pose initiale et une pose finale, qui sont classé par le système décrit ci-dessus.This thesis aims at improving the complex task of hand gesture recognition by utilizing machine learning techniques to learn from features calculated from 3D point cloud data. The main contributions of this work are embedded in the domains of machine learning and in the human-machine interaction. Since the goal is to demonstrate that a robust real-time capable system can be set up which provides a supportive means of interaction, the methods researched have to be light-weight in the sense that descriptivity balances itself with the calculation overhead needed to, in fact, remain real-time capable. To this end several approaches were tested:Initially the fusion of multiple ToF-sensors to improve the overall recognition rate was researched. It is examined, how employing more than one sensor can significantly boost recognition results in especially difficult cases and get a first grasp on the influence of the descriptors for this task as well as the influence of the choice of parameters on the calculation of the descriptor. The performance of MLPs with standard parameters is compared with the performance of SVMs for which the parameters have been obtained via grid search.Building on these results, the integration of the system into the car interior is shown. It is demonstrated how such a system can easily be integrated into an outdoor environment subject to strongly varying lighting conditions without the need for tedious calibration procedures. Furthermore the introduction of a modified light-weight version of the descriptor coupled with an extended database significantly boosts the frame rate for the whole recognition pipeline. Lastly the introduction of confidence measures for the output of the MLPs allows for more stable classification results and gives an insight on the innate challenges of this multiclass problem in general.In order to improve the classification performance of the MLPs without the need for sophisticated algorithm design or extensive parameter search a simple method is proposed which makes use of the existing recognition routines by exploiting information already present in the output neurons of the MLPs. A simple fusion technique is proposed which combines descriptor features with neuron confidences coming from a previously trained net and proves that augmented results can be achieved in nearly all cases for problem classes and individuals respectively.These findings are analyzed in-depth on a more theoretical scale by comparing the effectiveness of learning solely on neural activities in the output layer with the previously introduced fusion approach. In order to take into account temporal information, the thesis describes a possible approach on how to exploit the fact that we are dealing with a problem within which data is processed in a sequential manner and therefore problem-specific information can be taken into account. This approach classifies a hand pose by fusing descriptor features with neural activities coming from previous time steps and lays the ground work for the following section of making the transition towards dynamic hand gestures. Furthermore an infotainment system realized on a mobile device is introduced and coupled with the preprocessing and recognition module which in turn is integrated into an automotive setting demonstrating a possible testing environment for a gesture recognition system.In order to extend the developed system to allow for dynamic hand gesture interaction a simplified approach is proposed. This approach demonstrates that recognition of dynamic hand gesture sequences can be achieved with the simple definition of a starting and an ending pose based on a recognition module working with sufficient accuracy and even allowing for relaxed restrictions in terms of defining the parameters for such a sequence
MĂ©thodes d'apprentissage pour l'interaction homme-machine
This thesis aims at improving the complex task of hand gesture recognition by utilizing machine learning techniques to learn from features calculated from 3D point cloud data. The main contributions of this work are embedded in the domains of machine learning and in the human-machine interaction. Since the goal is to demonstrate that a robust real-time capable system can be set up which provides a supportive means of interaction, the methods researched have to be light-weight in the sense that descriptivity balances itself with the calculation overhead needed to, in fact, remain real-time capable. To this end several approaches were tested:Initially the fusion of multiple ToF-sensors to improve the overall recognition rate was researched. It is examined, how employing more than one sensor can significantly boost recognition results in especially difficult cases and get a first grasp on the influence of the descriptors for this task as well as the influence of the choice of parameters on the calculation of the descriptor. The performance of MLPs with standard parameters is compared with the performance of SVMs for which the parameters have been obtained via grid search.Building on these results, the integration of the system into the car interior is shown. It is demonstrated how such a system can easily be integrated into an outdoor environment subject to strongly varying lighting conditions without the need for tedious calibration procedures. Furthermore the introduction of a modified light-weight version of the descriptor coupled with an extended database significantly boosts the frame rate for the whole recognition pipeline. Lastly the introduction of confidence measures for the output of the MLPs allows for more stable classification results and gives an insight on the innate challenges of this multiclass problem in general.In order to improve the classification performance of the MLPs without the need for sophisticated algorithm design or extensive parameter search a simple method is proposed which makes use of the existing recognition routines by exploiting information already present in the output neurons of the MLPs. A simple fusion technique is proposed which combines descriptor features with neuron confidences coming from a previously trained net and proves that augmented results can be achieved in nearly all cases for problem classes and individuals respectively.These findings are analyzed in-depth on a more theoretical scale by comparing the effectiveness of learning solely on neural activities in the output layer with the previously introduced fusion approach. In order to take into account temporal information, the thesis describes a possible approach on how to exploit the fact that we are dealing with a problem within which data is processed in a sequential manner and therefore problem-specific information can be taken into account. This approach classifies a hand pose by fusing descriptor features with neural activities coming from previous time steps and lays the ground work for the following section of making the transition towards dynamic hand gestures. Furthermore an infotainment system realized on a mobile device is introduced and coupled with the preprocessing and recognition module which in turn is integrated into an automotive setting demonstrating a possible testing environment for a gesture recognition system.In order to extend the developed system to allow for dynamic hand gesture interaction a simplified approach is proposed. This approach demonstrates that recognition of dynamic hand gesture sequences can be achieved with the simple definition of a starting and an ending pose based on a recognition module working with sufficient accuracy and even allowing for relaxed restrictions in terms of defining the parameters for such a sequence.Cette thèse a pour but d'améliorer la tâche de reconnaître des gestes de main en utilisant des techniques d'apprentissage par ordinateur et de traitement du signal. Les principales contributions de la thèse sont dédiés à la théorie de l'apprentissage par ordinateur et à l'interaction homme-machine. L'objectif étant d'implanter toutes méthodes en temps réel, toute méthode employé au cours de cette thèse était un compromis entre puissance et temps de calcul nécessaire.Plusieurs pistes ont été poursuivi : au début, la fusion des informations fournies par plusieurs capteurs tu type « time-of-flight » a été étudiée, dans le but d'améliorer le taux de reconnaissances correctes par rapport au cas avec un seul capteur. En particulier, l'impact des différentes caractéristiques calculés à partir d'une nuage de points, et de ses paramètres, a été évalué. Egalement, la performance des réseaux multi-couches (MLP) à été comparé avec celle d'un séparateur à vaste marge (SVM).En s'appuyant sur ces résultats, l'implantation du système dans une voiture a eté effectuée. Tout d'abord, nous avons montré que le système n'est pas du tout gêné par le fait d'être exposé aux conditions d'éclairage « outdoor ». L'extension de la base d'entraînement et une modification des caractéristiques calculé de la nuage des points a pu augmenter le taux de bonnes reconnaissances de façon très significative, ainsi que le rajout des mesures de confiance à la classification.Afin d'améliorer la performance des classifieurs à la base des réseaux multi-couche (MLP), une nouvelle méthode assez simple a été mise au point ensuite. Cette méthode met au profit des informations déjà présentes dans la dernière couche du réseau. En combinant cette nouvelle approche avec une technique de fusion, le taux de bonnes reconnaissances est amélioré, et surtout pour le cas des échantillons « difficiles ». Ces résultats ont été analysés et comparés de façon approfondie en comparant des différentes possibilités de fusion dans un tel contexte. L'exploitation du fait que les données traitées dont des séquences, et qu'il y a par conséquent une cohérence temporelle dans des échantillons successifs, a également été abordée un utilisant les mêmes techniques de fusion. Un système de « infotainment » implanté sur un smartphone, qui utilise les techniques décrites ici, a également été réalisé.Dans un dernier temps, un modèle simplifié de la reconnaissance des gestes dynamiques a été proposé et validé dans un contexte applicatif. Il a été montré que un geste peut être défini de façon assez robuste par une pose initiale et une pose finale, qui sont classé par le système décrit ci-dessus
Hand Gesture Recognition in Automotive Human–Machine Interaction Using Depth Cameras
In this review, we describe current Machine Learning approaches to hand gesture recognition with depth data from time-of-flight sensors. In particular, we summarise the achievements on a line of research at the Computational Neuroscience laboratory at the Ruhr West University of Applied Sciences. Relating our results to the work of others in this field, we confirm that Convolutional Neural Networks and Long Short-Term Memory yield most reliable results. We investigated several sensor data fusion techniques in a deep learning framework and performed user studies to evaluate our system in practice. During our course of research, we gathered and published our data in a novel benchmark dataset (REHAP), containing over a million unique three-dimensional hand posture samples
A generic and adaptive approach for workload distribution in multitier cluster systems with an application to distributed matrix multiplication
We present a novel approach of distributing matrix multiplications among GPU-equipped nodes in a cluster system. In this context we discuss the induced challenges and possible solutions. Additionally we state an algorithm which outperforms optimized GPU BLAS libraries for small matrices. Furthermore we provide a novel theoretical model for distributing algorithms within homogeneous computation systems with multiple hierarchies. In the context of this model we develop an algorithm which can find the optimal distribution parameters for each involved subalgorithm. We provide a detailed analysis of the algorithms space and time complexities and justify its use with a structured evaluation within a small GPU-equipped Beowulf cluster
Time-of-Flight based multi-sensor fusion strategies for hand gesture recognition
International audienceBuilding upon prior results, we present an alterna-tive approach to efficiently classifying a complex set of 3D hand poses obtained from modern Time-Of-Flight-Sensors (TOF). We demonstrate it is possible to achieve satisfactory results in spite of low resolution and high noise (inflicted by the sensors) and a demanding outdoor environment. We set up a large database of pointclouds in order to train multilayer perceptrons as well as support vector machines to classify the various hand poses. Our goal is to fuse data from multiple TOF sensors, which observe the poses from multiple angles. The presented contribution illustrates that real-time capability can be maintained with such a setup as the used 3D descriptors, the fusion strategy as well as the online confidence measures are computationally efficient
A light-weight real-time applicable hand gesture recognition system for automotive applications
International audienceWe present a novel approach for improved hand-gesture recognition by a single time-of-flight(ToF) sensor in an automotive environment. As the sensor's lateral resolution is comparatively low, we employ a learning approach comprising multiple processing steps, including PCA-based cropping, the computation of robust point cloud descriptors and training of a Multilayer perceptron (MLP) on a large database of samples. A sophisticated temporal fusion technique boosts the overall robustness of recognition by taking into account data coming from previous classification steps. Overall results are very satisfactory when evaluated on a large benchmark set of ten different hand poses, especially when it comes to generalization on previously unknown persons