14 research outputs found

    Unsupervised Video Understanding by Reconciliation of Posture Similarities

    Full text link
    Understanding human activity and being able to explain it in detail surpasses mere action classification by far in both complexity and value. The challenge is thus to describe an activity on the basis of its most fundamental constituents, the individual postures and their distinctive transitions. Supervised learning of such a fine-grained representation based on elementary poses is very tedious and does not scale. Therefore, we propose a completely unsupervised deep learning procedure based solely on video sequences, which starts from scratch without requiring pre-trained networks, predefined body models, or keypoints. A combinatorial sequence matching algorithm proposes relations between frames from subsets of the training data, while a CNN is reconciling the transitivity conflicts of the different subsets to learn a single concerted pose embedding despite changes in appearance across sequences. Without any manual annotation, the model learns a structured representation of postures and their temporal development. The model not only enables retrieval of similar postures but also temporal super-resolution. Additionally, based on a recurrent formulation, next frames can be synthesized.Comment: Accepted by ICCV 201

    Accurate online alignment of human motor performances

    Get PDF
    HĂĽlsmann F, Richter A, Kopp S, Botsch M. Accurate online alignment of human motor performances. In: Proceedings of ACM Motion in Games. Barcelona: ACM; 2017: pp. 7:1-7:6

    A LITERATURE STUDY ON HUMAN MOTION ANALYSIS USING DEPTH IMAGERY

    Get PDF
    Analysis of human behavior through visual information is a highly active research topic in the computer vision community. This analysis is achieved in the literature via images from the conventional cameras; however recently depth sensors are used to obtain new type of images known as depth images. This human motion analysis can be widely applied to various domains, such as security surveillance in public spaces, shopping centers and airports. Home care for elderly people and children can use live video streaming from an integrated home monitoring system to prompt timely assistance. Moreover, automatic human motion analysis can be used in Human–Computer/Robot Interaction (HCI/HRI), video retrieval, virtual reality, computer gaming and many other fields. Human motion analysis using a depth sensor is still a new research area. Most work is focused on motion capture of articulated body skeletons. However, the research community is showing interest in higher level action related research. This report explains the advantages of depth imagery and then describes the new categories of depth sensors such as Microsoft Kinect that are available to obtain depth images. High-resolution real-time depth images are cheaply available because of tools like Microsoft Kinect. The main published research on the use of depth imagery for analyzing human activity is reviewed. A growing research area is the recognition of human actions and hence the existing work focuses mainly on body part detection and pose estimation. The publicly available datasets that include depth imagery are listed in this report, and also the software libraries that are available for the depth sensors are explained.  With the development of depth sensors, an increasing number of algorithms have employed depth data in vision-based human action recognition. The increasing availability of depth sensors is broadening the scope for future research. This reports provides an overview of this emerging field followed by various vision based algorithms used for human motion analysis

    Optimized limited memory and warping LCSS for online gesture recognition or overlearning?

    Get PDF
    In this paper, we present and evaluate a new algorithm for online gesture recognition in noisy streams. This technique relies upon the proposed LM-WLCSS (Limited Memory and Warping LCSS) algorithm that has demonstrated its efficiency on gesture recognition. This new method involves a quantization step (via the KMeans clustering algorithm). This transforms new data to a finite set. In this way, each new sample can be compared to several templates (one per class) and gestures are rejected based on a previously trained rejection threshold. Then, an algorithm, called SearchMax, find a local maximum within a sliding window and output whether or not the gesture has been recognized. In order to resolve conflicts that may occur, another classifier could be completed. As the K-Means clustering algorithm, needs to be initialized with the number of clusters to create, we also introduce a straightforward optimization process. Such an operation also optimizes the window size for the SearchMax algorithm. In order to demonstrate the robustness of our algorithm, an experiment has been performed over two different data sets. However, results on tested data sets are only accurate when training data are used as test data. This may be due to the fact that the method is in an overlearning state

    Data’s Hidden Data: Qualitative Revelations of Sports Efficiency Analysis brought by Neural Network Performance Metrics

    Get PDF
    In the study of effectiveness and efficiency of an athlete’s performance, intelligent systems can be applied on qualitative approaches and their performance metrics provide useful information on not just the quality of the data, but also reveal issues about the observational criteria and data collection context itself. 2000 executions of two similar exercises, with different levels of complexity, were collected through a single inertial sensor applied on the fencer’s weapon hand. After the signals were split into their key segments through Dynamic Time Warping, the extracted features and respective qualitative evaluations were fed into a Neural Network to learn the patterns that distinguish a good from a bad execution. The performance analysis of the resulting models returned a prediction accuracy of 76.6% and 72.7% for each exercise, but other metrics pointed to the data suffering from high bias. This points towards an imbalance in the qualitative criteria representation of the bad executions, which can be explained by: i) reduced number of samples; ii) ambiguity in the definition of the observation criteria; iii) a single sensor being unable to fully capture the context without taking the actions of the other key body segments into account

    Gesture passwords: concepts, methods and challenges

    Full text link
    Biometrics are a convenient alternative to traditional forms of access control such as passwords and pass-cards since they rely solely on user-specific traits. Unlike alphanumeric passwords, biometrics cannot be given or told to another person, and unlike pass-cards, are always “on-hand.” Perhaps the most well-known biometrics with these properties are: face, speech, iris, and gait. This dissertation proposes a new biometric modality: gestures. A gesture is a short body motion that contains static anatomical information and changing behavioral (dynamic) information. This work considers both full-body gestures such as a large wave of the arms, and hand gestures such as a subtle curl of the fingers and palm. For access control, a specific gesture can be selected as a “password” and used for identification and authentication of a user. If this particular motion were somehow compromised, a user could readily select a new motion as a “password,” effectively changing and renewing the behavioral aspect of the biometric. This thesis describes a novel framework for acquiring, representing, and evaluating gesture passwords for the purpose of general access control. The framework uses depth sensors, such as the Kinect, to record gesture information from which depth maps or pose features are estimated. First, various distance measures, such as the log-euclidean distance between feature covariance matrices and distances based on feature sequence alignment via dynamic time warping, are used to compare two gestures, and train a classifier to either authenticate or identify a user. In authentication, this framework yields an equal error rate on the order of 1-2% for body and hand gestures in non-adversarial scenarios. Next, through a novel decomposition of gestures into posture, build, and dynamic components, the relative importance of each component is studied. The dynamic portion of a gesture is shown to have the largest impact on biometric performance with its removal causing a significant increase in error. In addition, the effects of two types of threats are investigated: one due to self-induced degradations (personal effects and the passage of time) and the other due to spoof attacks. For body gestures, both spoof attacks (with only the dynamic component) and self-induced degradations increase the equal error rate as expected. Further, the benefits of adding additional sensor viewpoints to this modality are empirically evaluated. Finally, a novel framework that leverages deep convolutional neural networks for learning a user-specific “style” representation from a set of known gestures is proposed and compared to a similar representation for gesture recognition. This deep convolutional neural network yields significantly improved performance over prior methods. A byproduct of this work is the creation and release of multiple publicly available, user-centric (as opposed to gesture-centric) datasets based on both body and hand gestures

    Touché: Data-Driven Interactive Sword Fighting in Virtual Reality

    Get PDF
    VR games offer new freedom for players to interact naturally using motion. This makes it harder to design games that react to player motions convincingly. We present a framework for VR sword fighting experiences against a virtual character that simplifies the necessary technical work to achieve a convincing simulation. The framework facilitates VR design by abstracting from difficult details on the lower “physical” level of interaction, using data-driven models to automate both the identification of user actions and the synthesis of character animations. Designers are able to specify the character's behaviour on a higher “semantic” level using parameterised building blocks, which allow for control over the experience while minimising manual development work. We conducted a technical evaluation, a questionnaire study and an interactive user study. Our results suggest that the framework produces more realistic and engaging interactions than simple hand-crafted interaction logic, while supporting a controllable and understandable behaviour design

    Depth-Based User Interface

    Get PDF
    Tradiční uživatelská rozhraní nejsou vždy nejvhodnější variantou ovládání aplikací. Úlohou této práce je seznámit se s problematikou zpracování dat ze senzoru Kinect a analyzovat možnosti ovládání aplikací prostřednictvím hloubkových senzorů. A následně, s využitím získaných znalostí, navrhnout uživatelské rozhraní pro práci s multimediálním obsahem, které pro interakci s uživatelem využívá senzor Kinect.Conventional user interfaces are not always the most appropriate option of application controlling. The objective of this work is to study the issue of Kinect sensor data processing and to analyze the possibilities of application controlling through depth sensors. And consequently, using obtained knowledge, to design a user interface for working with multimedia content, which uses Kinect sensor for interaction with the user.

    Towards streaming gesture recognition

    Get PDF
    The emergence of low-cost sensors allows more devices to be equipped with various types of sensors. In this way, mobile device such as smartphones or smartwatches now may contain accelerometers, gyroscopes, etc. This offers new possibilities for interacting with the environment and benefits would come to exploit these sensors. As a consequence, the literature on gesture recognition systems that employ such sensors grow considerably. The literature regarding online gesture recognition counts many methods based on Dynamic Time Warping (DTW). However, this method was demonstrated has non-efficient for time series from inertial sensors unit as a lot of noise is present. In this way new methods based on LCSS (Longest Common SubSequence) were introduced. Nevertheless, none of them focus on a class optimization process. In this master thesis, we present and evaluate a new algorithm for online gesture recognition in noisy streams. This technique relies upon the LM-WLCSS (Limited Memory and Warping LCSS) algorithm that has demonstrated its efficiency on gesture recognition. This new method involves a quantization step (via the K-Means clustering algorithm) that transforms new data to a finite set. In this way, each new sample can be compared to several templates (one per class). Gestures are rejected based on a previously trained rejection threshold. Thereafter, an algorithm, called SearchMax, find a local maximum within a sliding window and output whether or not the gesture has been recognized. In order to resolve conflicts that may occur, another classifier (i.e. C4.5) could be completed. As the K-Means clustering algorithm needs to be initialized with the number of clusters to create, we also introduce a straightforward optimization process. Such an operation also optimizes the window size for the SearchMax algorithm. In order to demonstrate the robustness of our algorithm, an experiment has been performed over two different data sets. However, results on tested data sets are only accurate when training data are used as test data. This may be due to the fact that the method is in an overlearning state. L’apparition de nouveaux capteurs à bas prix a permis d’en équiper dans beaucoup plus d’appareils. En effet, dans les appareils mobiles tels que les téléphones et les montres intelligentes nous retrouvons des accéléromètres, gyroscopes, etc. Ces capteurs présents dans notre vie quotidienne offrent de toutes nouvelles possibilités en matière d’interaction avec notre environnement et il serait avantageux de les utiliser. Cela a eu pour conséquence une augmentation considérable du nombre de recherches dans le domaine de reconnaissance de geste basé sur ce type de capteur. La littérature concernant la reconnaissance de gestes en ligne comptabilise beaucoup de méthodes qui se basent sur Dynamic Time Warping (DTW). Cependant, il a été démontré que cette méthode se révèle inefficace en ce qui concerne les séries temporelles provenant d’une centrale à inertie puisqu’elles contiennent beaucoup de bruit. En ce sens de nouvelles méthodes basées sur LCSS (Longest Common SubSequence) sont apparues. Néanmoins, aucune d’entre elles ne s’est focalisée sur un processus d’optimisation par class. Ce mémoire de maîtrise consiste en une présentation et une évaluation d’un nouvel algorithme pour la reconnaissance de geste en ligne avec des données bruitées. Cette technique repose sur l’algorithme LM-WLCSS (Limited Memory and Warping LCSS) qui a d’ores et déjà démontré son efficacité quant à la reconnaissance de geste. Cette nouvelle méthode est donc composée d’une étape dite de quantification (grâce à l’algorithme de regroupement K-Means) qui se charge de convertir les nouvelles données entrantes vers un ensemble de données fini. Chaque nouvelle donnée peut donc être comparée à plusieurs motifs (un par classe) et un geste est reconnu dès lors que son score dépasse un seuil préalablement entrainé. Puis, un autre algorithme appelé SearchMax se charge de trouver un maximum local au sein d’une fenêtre glissant afin de préciser si oui ou non un geste a été reconnu. Cependant des conflits peuvent survenir et en ce sens un autre classifieur (c.-àd. C4.5) est chainé. Étant donné que l’algorithme de regroupement K-Means a besoin d’une valeur pour le nombre de regroupements à faire, nous introduisons également une technique simple d’optimisation à ce sujet. Cette partie d’optimisation se charge également de trouver la meilleure taille de fenêtre possible pour l’algorithme SearchMax. Afin de démontrer l’efficacité et la robustesse de notre algorithme, nous l’avons testé sur deux ensembles de données différents. Cependant, les résultats sur les ensembles de données testées n’étaient bons que lorsque les données d’entrainement étaient utilisées en tant que données de test. Cela peut être dû au fait que la méthode est dans un état de surapprentissage

    Exploring Gesture Recognition in the Virtual Reality Space

    Get PDF
    This thesis presents two novel modifications to a gesture recognition systemfor virtual reality devices and applications. In doing this it evaluates usersmovements in VR when presented with gestures and uses this information todevelop a continuous tracking system that can detect the start and end of gestures.It also expands on previous work with gestures in games with an implementationof an adaptive database system that has been seen to improve accuracy rates.The database allows users to immediately start using the system with no priortraining and will improve accuracy rates as they spend more time in the game.Furthermore it evaluates both the explicit and continuous recognition systemsthrough user based studies. The results from these studies show promise for theusability of gesture based interaction systems for VR devices in the future. Theyalso provide findings that suggest that for the use case of games continuous systemcould be too cumbersome for users
    corecore