17 research outputs found

    A human motion feature based on semi-supervised learning of GMM

    Get PDF
    Using motion capture to create naturally looking motion sequences for virtual character animation has become a standard procedure in the games and visual effects industry. With the fast growth of motion data, the task of automatically annotating new motions is gaining an importance. In this paper, we present a novel statistic feature to represent each motion according to the pre-labeled categories of key-poses. A probabilistic model is trained with semi-supervised learning of the Gaussian mixture model (GMM). Each pose in a given motion could then be described by a feature vector of a series of probabilities by GMM. A motion feature descriptor is proposed based on the statistics of all pose features. The experimental results and comparison with existing work show that our method performs more accurately and efficiently in motion retrieval and annotation

    Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion

    Full text link

    Movement Sonification: Effects on Motor Learning beyond Rhythmic Adjustments

    Get PDF
    Motor learning is based on motor perception and emergent perceptual-motor representations. A lot of behavioral research is related to single perceptual modalities, but during last two decades the contribution of multimodal perception on motor behavior was discovered more and more. A growing number of studies indicate an enhanced impact of multimodal stimuli on motor perception, motor control and motor learning in terms of better precision and higher reliability of the related actions. Behavioral research is supported by neurophysiological data, revealing that multisensory integration supports motor control and learning. But the overwhelming part of both research lines is dedicated to basic research. Besides research in the domains of music, dance and motor rehabilitation there is nearly no evidence about enhanced effectiveness of multisensory information on learning of gross motor skills. To reduce this gap movement sonification is used here in applied research on motor learning in sports.Based on the current knowledge on the multimodal organization of the perceptual system we generate additional real-time movement information being suitable for integration with perceptual feedback streams of visual and proprioceptive modality. With ongoing training synchronously processed auditory information should be initially integrated into the emerging internal models, enhancing the efficacy of motor learning. This is achieved by a direct mapping of kinematic and dynamic motion parameters to electronic sounds, resulting in continuous auditory and convergent audiovisual or audio-proprioceptive stimulus arrays. In sharp contrast to other approaches using acoustic information as error feedback in motor learning settings we try to generate additional movement information suitable for acceleration and enhancement of adequate sensorimotor representations and processible below the level of consciousness. In the experimental setting participants were asked to learn a closed motor skill (technique acquisition of indoor rowing). One group was treated with visual information and two groups with audiovisual information (sonification vs. natural sounds). For all three groups learning became evident and remained stable. Participants treated with additional movement sonification showed better performance compared to both other groups. Results indicate that movement sonification enhances motor learning of a complex gross motor skill – even exceeding usually expected acoustic rhythmical effects on motor learning

    Adaptive Gesture Recognition with Variation Estimation for Interactive Systems

    Get PDF
    This paper presents a gesture recognition/adaptation system for Human Computer Interaction applications that goes beyond activity classification and that, complementary to gesture labeling, characterizes the movement execution. We describe a template-based recognition method that simultaneously aligns the input gesture to the templates using a Sequential Montecarlo inference technique. Contrary to standard template- based methods based on dynamic programming, such as Dynamic Time Warping, the algorithm has an adaptation process that tracks gesture variation in real-time. The method continuously updates, during execution of the gesture, the estimated parameters and recognition results which offers key advantages for continuous human-machine interaction. The technique is evaluated in several different ways: recognition and early recognition are evaluated on a 2D onscreen pen gestures; adaptation is assessed on synthetic data; and both early recognition and adaptation is evaluation in a user study involving 3D free space gestures. The method is not only robust to noise and successfully adapts to parameter variation but also performs recognition as well or better than non-adapting offline template-based methods

    Context-based motion retrieval using vector space model

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 87-89).Motion retrieval is the problem of retrieving highly relevant motions in a timely manner. The principal challenge is to characterize the similarity between two motions effectively, which is tightly related to the gap between the motion data's representation and its semantics. Our approach uses vector space model to measure the similarities among motions, which are made discrete using the vocabulary technique and transformation invariant using the relational feature model. In our approach, relational features are first extracted from motion data. then such features are clustered into a motion vocabulary. Finally motions are turned into bag of words and retrieved using vector-space model. We implemented this new system and tested it on two benchmark databases composed of real world data. Two existing methods, the dynamics time warping method and the binary feature method, are implemented for comparison. The results shows that our system are comparable in effectiveness with the dynamic time warping system, but runs 100 to 400 times faster. In comparison to retrieval with binary features, it is just as fast but more accurate and practical.The success of our system points to several additional improvements. Our experiments reveal that the velocity features improve the relevance of retrieved results, but more effort should be dedicated to determining the best set of features for motion retrieval. The same experiments should be performed on large databases and in particular to test how this performance generalizes on test motions outside the original database. The alternative vocabulary organizations, such as vocabulary tree and random forest, should be investigated because they can improve our approach by providing more flexibility to the similarity scoring model and reducing the approximation error of the vocabulary. Because the bag of words model ignores the temporal ordering of key features, a wavelet model should also be explored as a mechanism to encode features across different time scales.(cont.) The alternative vocabulary organizations, such as vocabulary tree and random forest, should be investigated because they can improve our approach by providing more flexibility to the similarity scoring model and reducing the approximation error of the vocabulary. Because the bag of words model ignores the temporal ordering of key features, a wavelet model should also be explored as a mechanism to encode features across different time scales.by Zhunping 'Justin' Zhang.S.M

    사람 동작의 마커없는 재구성

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 이제희.Markerless human pose recognition using a single-depth camera plays an important role in interactive graphics applications and user interface design. Recent pose recognition algorithms have adopted machine learning techniques, utilizing a large collection of motion capture data. The effectiveness of the algorithms is greatly influenced by the diversity and variability of training data. Many applications have been developed to use human body as a controller to utilize these pose recognition systems. In many cases, using general props help us perform immersion control of the system. Nevertheless, the human pose and prop recognition system is not yet sufficiently powerful. Moreover, there is a problem such as invisible parts lower the quality of human pose estimation from a single depth camera due to an absence of observed data. In this thesis, we present techniques to manipulate the human motion data for enabling to estimate human pose from a single depth camera. First, we developed method that resamples a collection of human motion data to improve the pose variability and achieve an arbitrary size and level of density in the space of human poses. The space of human poses is high-dimensional and thus brute-force uniform sampling is intractable. We exploit dimensionality reduction and locally stratified sampling to generate either uniform or application-specifically biased distributions in the space of human poses. Our algorithm is learned to recognize such challenging poses such as sit, kneel, stretching and yoga using a remarkably small amount of training data. The recognition algorithm can also be steered to maximize its performance for a specific domain of human poses. We demonstrate that our algorithm performs much better than Kinect SDK for recognizing challenging acrobatic poses, while performing comparably for easy upright standing poses. Second, we find out environmental object which interact with human beings. We proposed a new props recognition system, which can applied on the existing human pose estimation algorithm, and enable to powerful props estimation with human poses at the same times. Our work is widely applicable to various types of controllers system, which deals with the human pose and addition items simultaneously. Finally, we enhance the pose estimation result. All the part of human body cannot be always estimated from the single depth image. In some case, some body parts are occluded by other body parts, and sometimes estimation system fail to success. For solving this problem, we construct novel neural network model which called autoencoder. It is constructed from huge natural pose data. Then it can reconstruct the missing parameter of human pose joint as new correct joint. It can be applied to many different human pose estimation systems to improve their performance.1 Introduction 1 2 Background 9 2.1 Research on Motion Data 9 2.2 Human Pose Estimation 10 2.3 Machine Learning on Human Pose Estimation 11 2.4 Dimension Reduction and Uniform Sampling 12 2.5 Neural Networks on Motion Data 13 3 Markerless Human Pose Recognition System 14 3.1 System Overview 14 3.2 Preprocessing Data Process 15 3.3 Randomized Decision Tree 20 3.4 Joint Estimation Process 22 4 Controllable Sampling Data in the Space of Human Poses 26 4.1 Overview 26 4.2 Locally Stratified Sampling 28 4.3 Experimental Results 34 4.4 Discussion 40 5 Human Pose Estimation with Interacting Prop from Single Depth Image 48 5.1 Introduction 48 5.2 Prop Estimation 50 5.3 Experimental Results 53 5.4 Discussion 57 6 Enhancing the Estimation of Human Pose from Incomplete Joints 58 6.1 Overview 58 6.2 Method 59 6.3 Experimental Result 62 6.4 Discussion 66 7 Conclusion 67 Bibliography 69 초록 81Docto

    Indexing and Retrieval of 3D Articulated Geometry Models

    Get PDF
    In this PhD research study, we focus on building a content-based search engine for 3D articulated geometry models. 3D models are essential components in nowadays graphic applications, and are widely used in the game, animation and movies production industry. With the increasing number of these models, a search engine not only provides an entrance to explore such a huge dataset, it also facilitates sharing and reusing among different users. In general, it reduces production costs and time to develop these 3D models. Though a lot of retrieval systems have been proposed in recent years, search engines for 3D articulated geometry models are still in their infancies. Among all the works that we have surveyed, reliability and efficiency are the two main issues that hinder the popularity of such systems. In this research, we have focused our attention mainly to address these two issues. We have discovered that most existing works design features and matching algorithms in order to reflect the intrinsic properties of these 3D models. For instance, to handle 3D articulated geometry models, it is common to extract skeletons and use graph matching algorithms to compute the similarity. However, since this kind of feature representation is complex, it leads to high complexity of the matching algorithms. As an example, sub-graph isomorphism can be NP-hard for model graph matching. Our solution is based on the understanding that skeletal matching seeks correspondences between the two comparing models. If we can define descriptive features, the correspondence problem can be solved by bag-based matching where fast algorithms are available. In the first part of the research, we propose a feature extraction algorithm to extract such descriptive features. We then convert the skeletal matching problems into bag-based matching. We further define metric similarity measure so as to support fast search. We demonstrate the advantages of this idea in our experiments. The improvement on precision is 12\% better at high recall. The indexing search of 3D model is 24 times faster than the state of the art if only the first relevant result is returned. However, improving the quality of descriptive features pays the price of high dimensionality. Curse of dimensionality is a notorious problem on large multimedia databases. The computation time scales exponentially as the dimension increases, and indexing techniques may not be useful in such situation. In the second part of the research, we focus ourselves on developing an embedding retrieval framework to solve the high dimensionality problem. We first argue that our proposed matching method projects 3D models on manifolds. We then use manifold learning technique to reduce dimensionality and maximize intra-class distances. We further propose a numerical method to sub-sample and fast search databases. To preserve retrieval accuracy using fewer landmark objects, we propose an alignment method which is also beneficial to existing works for fast search. The advantages of the retrieval framework are demonstrated in our experiments that it alleviates the problem of curse of dimensionality. It also improves the efficiency (3.4 times faster) and accuracy (30\% more accurate) of our matching algorithm proposed above. In the third part of the research, we also study a closely related area, 3D motions. 3D motions are captured by sticking sensor on human beings. These captured data are real human motions that are used to animate 3D articulated geometry models. Creating realistic 3D motions is an expensive and tedious task. Although 3D motions are very different from 3D articulated geometry models, we observe that existing works also suffer from the problem of temporal structure matching. This also leads to low efficiency in the matching algorithms. We apply the same idea of bag-based matching into the work of 3D motions. From our experiments, the proposed method has a 13\% improvement on precision at high recall and is 12 times faster than existing works. As a summary, we have developed algorithms for 3D articulated geometry models and 3D motions, covering feature extraction, feature matching, indexing and fast search methods. Through various experiments, our idea of converting restricted matching to bag-based matching improves matching efficiency and reliability. These have been shown in both 3D articulated geometry models and 3D motions. We have also connected 3D matching to the area of manifold learning. The embedding retrieval framework not only improves efficiency and accuracy, but has also opened a new area of research

    Human gait identification and analysis

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Human gait identification has become an active area of research due to increased security requirements. Human gait identification is a potential new tool for identifying individuals beyond traditional methods. The emergence of motion capture techniques provided a chance of high accuracy in identification because completely recorded gait information can be recorded compared with security cameras. The aim of this research was to build a practical method of gait identification and investigate the individual characteristics of gait. For this purpose, a gait identification approach was proposed, identification results were compared by different methods, and several studies about the individual characteristics of gait were performed. This research included the following: (1) a novel, effective set of gait features were proposed; (2) gait signatures were extracted by three different methods: statistical method, principal component analysis, and Fourier expansion method; (3) gait identification results were compared by these different methods; (4) two indicators were proposed to evaluate gait features for identification; (5) novel and clear definitions of gait phases and gait cycle were proposed; (6) gait features were investigated by gait phases; (7) principal component analysis and the fixing root method were used to elucidate which features were used to represent gait and why; (8) gait similarity was investigated; (9) gait attractiveness was investigated. This research proposed an efficient framework for identifying individuals from gait via a novel feature set based on 3D motion capture data. A novel evaluating method of gait signatures for identification was proposed. Three different gait signature extraction methods were applied and compared. The average identification rate was over 93%, with the best result close to 100%. This research also proposed a novel dividing method of gait phases, and the different appearances of gait features in eight gait phases were investigated. This research identified the similarities and asymmetric appearances between left body movement and right body movement in gait based on the proposed gait phase dividing method. This research also initiated an analysing method for gait features extraction by the fixing root method. A prediction model of gait attractiveness was built with reasonable accuracy by principal component analysis and linear regression of natural logarithm of parameters. A systematic relationship was observed between the motions of individual markers and the attractiveness ratings. The lower legs and feet were extracted as features of attractiveness by the fixing root method. As an extension of gait research, human seated motion was also investigated.This study is funded by the Dorothy Hodgkin Postgraduate Awards and Beijing East Gallery Co. Ltd

    Exploring sparsity, self-similarity, and low rank approximation in action recognition, motion retrieval, and action spotting

    Get PDF
    This thesis consists of 4 major parts. In the first part (Chapters 1-2), we introduce the overview, motivation, and contribution of our works, and extensively survey the current literature for 6 related topics. In the second part (Chapters 3-7), we explore the concept of Self-Similarity in two challenging scenarios, namely, the Action Recognition and the Motion Retrieval. We build three-dimensional volume representations for both scenarios, and devise effective techniques that can produce compact representations encoding the internal dynamics of data. In the third part (Chapter 8), we explore the challenging action spotting problem, and propose a feature-independent unsupervised framework that is effective in spotting action under various real situations, even under heavily perturbed conditions. The final part (Chapters 9) is dedicated to conclusions and future works. For action recognition, we introduce a generic method that does not depend on one particular type of input feature vector. We make three main contributions: (i) We introduce the concept of Joint Self-Similarity Volume (Joint SSV) for modeling dynamical systems, and show that by using a new optimized rank-1 tensor approximation of Joint SSV one can obtain compact low-dimensional descriptors that very accurately preserve the dynamics of the original system, e.g. an action video sequence; (ii) The descriptor vectors derived from the optimized rank-1 approximation make it possible to recognize actions without explicitly aligning the action sequences of varying speed of execution or difference frame rates; (iii) The method is generic and can be applied using different low-level features such as silhouettes, histogram of oriented gradients (HOG), etc. Hence, it does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on five public datasets demonstrate that our method produces very good results and outperforms many baseline methods. For action recognition for incomplete videos, we determine whether incomplete videos that are often discarded carry useful information for action recognition, and if so, how one can represent such mixed collection of video data (complete versus incomplete, and labeled versus unlabeled) in a unified manner. We propose a novel framework to handle incomplete videos in action classification, and make three main contributions: (i) We cast the action classification problem for a mixture of complete and incomplete data as a semi-supervised learning problem of labeled and unlabeled data. (ii) We introduce a two-step approach to convert the input mixed data into a uniform compact representation. (iii) Exhaustively scrutinizing 280 configurations, we experimentally show on our two created benchmarks that, even when the videos are extremely sparse and incomplete, it is still possible to recover useful information from them, and classify unknown actions by a graph based semi-supervised learning framework. For motion retrieval, we present a framework that allows for a flexible and an efficient retrieval of motion capture data in huge databases. The method first converts an action sequence into a self-similarity matrix (SSM), which is based on the notion of self-similarity. This conversion of the motion sequences into compact and low-rank subspace representations greatly reduces the spatiotemporal dimensionality of the sequences. The SSMs are then used to construct order-3 tensors, and we propose a low-rank decomposition scheme that allows for converting the motion sequence volumes into compact lower dimensional representations, without losing the nonlinear dynamics of the motion manifold. Thus, unlike existing linear dimensionality reduction methods that distort the motion manifold and lose very critical and discriminative components, the proposed method performs well, even when inter-class differences are small or intra-class differences are large. In addition, the method allows for an efficient retrieval and does not require the time-alignment of the motion sequences. We evaluate the performance of our retrieval framework on the CMU mocap dataset under two experimental settings, both demonstrating very good retrieval rates. For action spotting, our framework does not depend on any specific feature (e.g. HOG/HOF, STIP, silhouette, bag-of-words, etc.), and requires no human localization, segmentation, or framewise tracking. This is achieved by treating the problem holistically as that of extracting the internal dynamics of video cuboids by modeling them in their natural form as multilinear tensors. To extract their internal dynamics, we devised a novel Two-Phase Decomposition (TP-Decomp) of a tensor that generates very compact and discriminative representations that are robust to even heavily perturbed data. Technically, a Rank-based Tensor Core Pyramid (Rank-TCP) descriptor is generated by combining multiple tensor cores under multiple ranks, allowing to represent video cuboids in a hierarchical tensor pyramid. The problem then reduces to a template matching problem, which is solved efficiently by using two boosting strategies: (i) to reduce the search space, we filter the dense trajectory cloud extracted from the target video; (ii) to boost the matching speed, we perform matching in an iterative coarse-to-fine manner. Experiments on 5 benchmarks show that our method outperforms current state-of-the-art under various challenging conditions. We also created a challenging dataset called Heavily Perturbed Video Arrays (HPVA) to validate the robustness of our framework under heavily perturbed situations
    corecore