7 research outputs found

    Proof of Concept For the Use of Motion Capture Technology In Athletic Pedagogy

    Get PDF
    Visualization has long been an important method for conveying complex information. Where information transfer using written and spoken means might amount to 200-250 words per minute, visual media can often convey information at many times this rate. This makes visualization a potentially important tool for education. Athletic instruction, particularly, can involve communication about complex human movement that is not easily conveyed with written or spoken descriptions. Video based instruction can be problematic since video data can contain too much information, thereby making it more difficult for a student to absorb what is cognitively necessary. The lesson is to present the learner what is needed and not more. We present a novel use of motion capture animation as an educational tool for teaching athletic movements. The advantage of motion capture is its ability to accurately represent real human motion in a minimalist context which removes extraneous information normally found in video. Motion capture animation only displays motion information, not additional information regarding the motion context. Producing an “automated coach” would be too large and difficult a problem to solve within the scope of a Master's thesis but we can perform initial steps including producing a useful software tool which performs data analysis on two motion datasets. We believe such a tool would be beneficial to a human coach as an analysis tool and the work would provide some useful understanding of next important steps towards perhaps someday producing an automated coach

    Context-based motion retrieval using vector space model

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 87-89).Motion retrieval is the problem of retrieving highly relevant motions in a timely manner. The principal challenge is to characterize the similarity between two motions effectively, which is tightly related to the gap between the motion data's representation and its semantics. Our approach uses vector space model to measure the similarities among motions, which are made discrete using the vocabulary technique and transformation invariant using the relational feature model. In our approach, relational features are first extracted from motion data. then such features are clustered into a motion vocabulary. Finally motions are turned into bag of words and retrieved using vector-space model. We implemented this new system and tested it on two benchmark databases composed of real world data. Two existing methods, the dynamics time warping method and the binary feature method, are implemented for comparison. The results shows that our system are comparable in effectiveness with the dynamic time warping system, but runs 100 to 400 times faster. In comparison to retrieval with binary features, it is just as fast but more accurate and practical.The success of our system points to several additional improvements. Our experiments reveal that the velocity features improve the relevance of retrieved results, but more effort should be dedicated to determining the best set of features for motion retrieval. The same experiments should be performed on large databases and in particular to test how this performance generalizes on test motions outside the original database. The alternative vocabulary organizations, such as vocabulary tree and random forest, should be investigated because they can improve our approach by providing more flexibility to the similarity scoring model and reducing the approximation error of the vocabulary. Because the bag of words model ignores the temporal ordering of key features, a wavelet model should also be explored as a mechanism to encode features across different time scales.(cont.) The alternative vocabulary organizations, such as vocabulary tree and random forest, should be investigated because they can improve our approach by providing more flexibility to the similarity scoring model and reducing the approximation error of the vocabulary. Because the bag of words model ignores the temporal ordering of key features, a wavelet model should also be explored as a mechanism to encode features across different time scales.by Zhunping 'Justin' Zhang.S.M

    Indexing and Retrieval of 3D Articulated Geometry Models

    Get PDF
    In this PhD research study, we focus on building a content-based search engine for 3D articulated geometry models. 3D models are essential components in nowadays graphic applications, and are widely used in the game, animation and movies production industry. With the increasing number of these models, a search engine not only provides an entrance to explore such a huge dataset, it also facilitates sharing and reusing among different users. In general, it reduces production costs and time to develop these 3D models. Though a lot of retrieval systems have been proposed in recent years, search engines for 3D articulated geometry models are still in their infancies. Among all the works that we have surveyed, reliability and efficiency are the two main issues that hinder the popularity of such systems. In this research, we have focused our attention mainly to address these two issues. We have discovered that most existing works design features and matching algorithms in order to reflect the intrinsic properties of these 3D models. For instance, to handle 3D articulated geometry models, it is common to extract skeletons and use graph matching algorithms to compute the similarity. However, since this kind of feature representation is complex, it leads to high complexity of the matching algorithms. As an example, sub-graph isomorphism can be NP-hard for model graph matching. Our solution is based on the understanding that skeletal matching seeks correspondences between the two comparing models. If we can define descriptive features, the correspondence problem can be solved by bag-based matching where fast algorithms are available. In the first part of the research, we propose a feature extraction algorithm to extract such descriptive features. We then convert the skeletal matching problems into bag-based matching. We further define metric similarity measure so as to support fast search. We demonstrate the advantages of this idea in our experiments. The improvement on precision is 12\% better at high recall. The indexing search of 3D model is 24 times faster than the state of the art if only the first relevant result is returned. However, improving the quality of descriptive features pays the price of high dimensionality. Curse of dimensionality is a notorious problem on large multimedia databases. The computation time scales exponentially as the dimension increases, and indexing techniques may not be useful in such situation. In the second part of the research, we focus ourselves on developing an embedding retrieval framework to solve the high dimensionality problem. We first argue that our proposed matching method projects 3D models on manifolds. We then use manifold learning technique to reduce dimensionality and maximize intra-class distances. We further propose a numerical method to sub-sample and fast search databases. To preserve retrieval accuracy using fewer landmark objects, we propose an alignment method which is also beneficial to existing works for fast search. The advantages of the retrieval framework are demonstrated in our experiments that it alleviates the problem of curse of dimensionality. It also improves the efficiency (3.4 times faster) and accuracy (30\% more accurate) of our matching algorithm proposed above. In the third part of the research, we also study a closely related area, 3D motions. 3D motions are captured by sticking sensor on human beings. These captured data are real human motions that are used to animate 3D articulated geometry models. Creating realistic 3D motions is an expensive and tedious task. Although 3D motions are very different from 3D articulated geometry models, we observe that existing works also suffer from the problem of temporal structure matching. This also leads to low efficiency in the matching algorithms. We apply the same idea of bag-based matching into the work of 3D motions. From our experiments, the proposed method has a 13\% improvement on precision at high recall and is 12 times faster than existing works. As a summary, we have developed algorithms for 3D articulated geometry models and 3D motions, covering feature extraction, feature matching, indexing and fast search methods. Through various experiments, our idea of converting restricted matching to bag-based matching improves matching efficiency and reliability. These have been shown in both 3D articulated geometry models and 3D motions. We have also connected 3D matching to the area of manifold learning. The embedding retrieval framework not only improves efficiency and accuracy, but has also opened a new area of research

    Human Motion Analysis Using Very Few Inertial Measurement Units

    Get PDF
    Realistic character animation and human motion analysis have become major topics of research. In this doctoral research work, three different aspects of human motion analysis and synthesis have been explored. Firstly, on the level of better management of tens of gigabytes of publicly available human motion capture data sets, a relational database approach has been proposed. We show that organizing motion capture data in a relational database provides several benefits such as centralized access to major freely available mocap data sets, fast search and retrieval of data, annotations based retrieval of contents, entertaining data from non-mocap sensor modalities etc. Moreover, the same idea is also proposed for managing quadruped motion capture data. Secondly, a new method of full body human motion reconstruction using very sparse configuration of sensors is proposed. In this setup, two sensor are attached to the upper extremities and one sensor is attached to the lower trunk. The lower trunk sensor is used to estimate ground contacts, which are later used in the reconstruction process along with the low dimensional inputs from the sensors attached to the upper extremities. The reconstruction results of the proposed method have been compared with the reconstruction results of the existing approaches and it has been observed that the proposed method generates lower average reconstruction errors. Thirdly, in the field of human motion analysis, a novel method of estimation of human soft biometrics such as gender, height, and age from the inertial data of a simple human walk is proposed. The proposed method extracts several features from the time and frequency domains for each individual step. A random forest classifier is fed with the extracted features in order to estimate the soft biometrics of a human. The results of classification have shown that it is possible with a higher accuracy to estimate the gender, height, and age of a human from the inertial data of a single step of his/her walk

    Rhythmic analysis of motion signals for music retrieval

    Get PDF
    viii, 108 leaves : ill. (chiefly col.) ; 29 cm.Includes abstract and appendix.Includes bibliographical references (leaves 100-108).This thesis presents a framework that queries a music database with rhythmic motion signals. Rather than the existing method to extract the motion signal's underlying rhythm by marking salient frames, this thesis proposes a novel approach, which converts the rhythmic motion signal to MIDI-format music and extracts its beat sequence as the rhythmic information of that motion. We extract "motion events" from the motion data based on characteristics such as movement directional change, root-y coordinate and angular-velocity. Those events are converted to music notes in order to generate an audio representation of the motion. Both this motion-generated music and the existing audio library are analyzed by a beat tracking algorithm. The music retrieval is completed based on the extracted beat sequences. We tried three approaches to retrieve music using motion queries, which are a mutual-information-based approach, two sample KS test and a rhythmic comparison algorithm. Feasibility of the framework is evaluated with pre-recorded music and motion recordings

    Exploring sparsity, self-similarity, and low rank approximation in action recognition, motion retrieval, and action spotting

    Get PDF
    This thesis consists of 4 major parts. In the first part (Chapters 1-2), we introduce the overview, motivation, and contribution of our works, and extensively survey the current literature for 6 related topics. In the second part (Chapters 3-7), we explore the concept of Self-Similarity in two challenging scenarios, namely, the Action Recognition and the Motion Retrieval. We build three-dimensional volume representations for both scenarios, and devise effective techniques that can produce compact representations encoding the internal dynamics of data. In the third part (Chapter 8), we explore the challenging action spotting problem, and propose a feature-independent unsupervised framework that is effective in spotting action under various real situations, even under heavily perturbed conditions. The final part (Chapters 9) is dedicated to conclusions and future works. For action recognition, we introduce a generic method that does not depend on one particular type of input feature vector. We make three main contributions: (i) We introduce the concept of Joint Self-Similarity Volume (Joint SSV) for modeling dynamical systems, and show that by using a new optimized rank-1 tensor approximation of Joint SSV one can obtain compact low-dimensional descriptors that very accurately preserve the dynamics of the original system, e.g. an action video sequence; (ii) The descriptor vectors derived from the optimized rank-1 approximation make it possible to recognize actions without explicitly aligning the action sequences of varying speed of execution or difference frame rates; (iii) The method is generic and can be applied using different low-level features such as silhouettes, histogram of oriented gradients (HOG), etc. Hence, it does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on five public datasets demonstrate that our method produces very good results and outperforms many baseline methods. For action recognition for incomplete videos, we determine whether incomplete videos that are often discarded carry useful information for action recognition, and if so, how one can represent such mixed collection of video data (complete versus incomplete, and labeled versus unlabeled) in a unified manner. We propose a novel framework to handle incomplete videos in action classification, and make three main contributions: (i) We cast the action classification problem for a mixture of complete and incomplete data as a semi-supervised learning problem of labeled and unlabeled data. (ii) We introduce a two-step approach to convert the input mixed data into a uniform compact representation. (iii) Exhaustively scrutinizing 280 configurations, we experimentally show on our two created benchmarks that, even when the videos are extremely sparse and incomplete, it is still possible to recover useful information from them, and classify unknown actions by a graph based semi-supervised learning framework. For motion retrieval, we present a framework that allows for a flexible and an efficient retrieval of motion capture data in huge databases. The method first converts an action sequence into a self-similarity matrix (SSM), which is based on the notion of self-similarity. This conversion of the motion sequences into compact and low-rank subspace representations greatly reduces the spatiotemporal dimensionality of the sequences. The SSMs are then used to construct order-3 tensors, and we propose a low-rank decomposition scheme that allows for converting the motion sequence volumes into compact lower dimensional representations, without losing the nonlinear dynamics of the motion manifold. Thus, unlike existing linear dimensionality reduction methods that distort the motion manifold and lose very critical and discriminative components, the proposed method performs well, even when inter-class differences are small or intra-class differences are large. In addition, the method allows for an efficient retrieval and does not require the time-alignment of the motion sequences. We evaluate the performance of our retrieval framework on the CMU mocap dataset under two experimental settings, both demonstrating very good retrieval rates. For action spotting, our framework does not depend on any specific feature (e.g. HOG/HOF, STIP, silhouette, bag-of-words, etc.), and requires no human localization, segmentation, or framewise tracking. This is achieved by treating the problem holistically as that of extracting the internal dynamics of video cuboids by modeling them in their natural form as multilinear tensors. To extract their internal dynamics, we devised a novel Two-Phase Decomposition (TP-Decomp) of a tensor that generates very compact and discriminative representations that are robust to even heavily perturbed data. Technically, a Rank-based Tensor Core Pyramid (Rank-TCP) descriptor is generated by combining multiple tensor cores under multiple ranks, allowing to represent video cuboids in a hierarchical tensor pyramid. The problem then reduces to a template matching problem, which is solved efficiently by using two boosting strategies: (i) to reduce the search space, we filter the dense trajectory cloud extracted from the target video; (ii) to boost the matching speed, we perform matching in an iterative coarse-to-fine manner. Experiments on 5 benchmarks show that our method outperforms current state-of-the-art under various challenging conditions. We also created a challenging dataset called Heavily Perturbed Video Arrays (HPVA) to validate the robustness of our framework under heavily perturbed situations
    corecore