242 research outputs found

    Similarity, Retrieval, and Classification of Motion Capture Data

    Get PDF
    Three-dimensional motion capture data is a digital representation of the complex spatio-temporal structure of human motion. Mocap data is widely used for the synthesis of realistic computer-generated characters in data-driven computer animation and also plays an important role in motion analysis tasks such as activity recognition. Both for efficiency and cost reasons, methods for the reuse of large collections of motion clips are gaining in importance in the field of computer animation. Here, an active field of research is the application of morphing and blending techniques for the creation of new, realistic motions from prerecorded motion clips. This requires the identification and extraction of logically related motions scattered within some data set. Such content-based retrieval of motion capture data, which is a central topic of this thesis, constitutes a difficult problem due to possible spatio-temporal deformations between logically related motions. Recent approaches to motion retrieval apply techniques such as dynamic time warping, which, however, are not applicable to large data sets due to their quadratic space and time complexity. In our approach, we introduce various kinds of relational features describing boolean geometric relations between specified body points and show how these features induce a temporal segmentation of motion capture data streams. By incorporating spatio-temporal invariance into the relational features and induced segments, we are able to adopt indexing methods allowing for flexible and efficient content-based retrieval in large motion capture databases. As a further application of relational motion features, a new method for fully automatic motion classification and retrieval is presented. We introduce the concept of motion templates (MTs), by which the spatio-temporal characteristics of an entire motion class can be learned from training data, yielding an explicit, compact matrix representation. The resulting class MT has a direct, semantic interpretation, and it can be manually edited, mixed, combined with other MTs, extended, and restricted. Furthermore, a class MT exhibits the characteristic as well as the variational aspects of the underlying motion class at a semantically high level. Classification is then performed by comparing a set of precomputed class MTs with unknown motion data and labeling matching portions with the respective motion class label. Here, the crucial point is that the variational (hence uncharacteristic) motion aspects encoded in the class MT are automatically masked out in the comparison, which can be thought of as locally adaptive feature selection

    Vision-based 3D Pose Retrieval and Reconstruction

    Get PDF
    The people analysis and the understandings of their motions are the key components in many applications like sports sciences, biomechanics, medical rehabilitation, animated movie productions and the game industry. In this context, retrieval and reconstruction of the articulated 3D human poses are considered as the significant sub-elements. In this dissertation, we address the problem of retrieval and reconstruction of the 3D poses from a monocular video or even from a single RGB image. We propose a few data-driven pipelines to retrieve and reconstruct the 3D poses by exploiting the motion capture data as a prior. The main focus of our proposed approaches is to bridge the gap between the separate media of the 3D marker-based recording and the capturing of motions or photographs using a simple RGB camera. In principal, we leverage both media together efficiently for 3D pose estimation. We have shown that our proposed methodologies need not any synchronized 3D-2D pose-image pairs to retrieve and reconstruct the final 3D poses, and are flexible enough to capture motion in any studio-like indoor environment or outdoor natural environment. In first part of the dissertation, we propose model based approaches for full body human motion reconstruction from the video input by employing just 2D joint positions of the four end effectors and the head. We resolve the 3D-2D pose-image cross model correspondence by developing an intermediate container the knowledge base through the motion capture data which contains information about how people move. It includes the 3D normalized pose space and the corresponding synchronized 2D normalized pose space created by utilizing a number of virtual cameras. We first detect and track the features of these five joints from the input motion sequences using SURF, MSER and colorMSER feature detectors, which vote for the possible 2D locations for these joints in the video. The extraction of suitable feature sets from both, the input control signals and the motion capture data, enables us to retrieve the closest instances from the motion capture dataset through employing the fast searching and retrieval techniques. We develop a graphical structure online lazy neighbourhood graph in order to make the similarity search more accurate and robust by deploying the temporal coherence of the input control signals. The retrieved prior poses are exploited further in order to stabilize the feature detection and tracking process. Finally, the 3D motion sequences are reconstructed by a non-linear optimizer that takes into account multiple energy terms. We evaluate our approaches with a series of experiment scenarios designed in terms of performing actors, camera viewpoints and the noisy inputs. Only a little preprocessing is needed by our methods and the reconstruction processes run close to real time. The second part of the dissertation is dedicated to 3D human pose estimation from a monocular single image. First, we propose an efficient 3D pose retrieval strategy which leads towards a novel data driven approach to reconstruct a 3D human pose from a monocular still image. We design and devise multiple feature sets for global similarity search. At runtime, we search for the similar poses from a motion capture dataset in a definite feature space made up of specific joints. We introduce two-fold method for camera estimation, where we exploit the view directions at which we perform sampling of the MoCap dataset as well as the MoCap priors to minimize the projection error. We also benefit from the MoCap priors and the joints' weights in order to learn a low-dimensional local 3D pose model which is constrained further by multiple energies to infer the final 3D human pose. We thoroughly evaluate our approach on synthetically generated examples, the real internet images and the hand-drawn sketches. We achieve state-of-the-arts results when the test and MoCap data are from the same dataset and obtain competitive results when the motion capture data is taken from a different dataset. Second, we propose a dual source approach for 3D pose estimation from a single RGB image. One major challenge for 3D pose estimation from a single RGB image is the acquisition of sufficient training data. In particular, collecting large amounts of training data that contain unconstrained images and are annotated with accurate 3D poses is infeasible. We therefore propose to use two independent training sources. The first source consists of images with annotated 2D poses and the second source consists of accurate 3D motion capture data. To integrate both sources, we propose a dual-source approach that combines 2D pose estimation with efficient and robust 3D pose retrieval. In our experiments, we show that our approach achieves state-of-the-art results and is even competitive when the skeleton structures of the two sources differ substantially. In the last part of the dissertation, we focus on how the different techniques, developed for the human motion capturing, retrieval and reconstruction can be adapted to handle the quadruped motion capture data and which new applications may appear. We discuss some particularities which must be considered during capturing the large animal motions. For retrieval, we derive the suitable feature sets in order to perform fast searches into the MoCap dataset for similar motion segments. At the end, we present a data-driven approach to reconstruct the quadruped motions from the video input data

    Low-latency compression of mocap data using learned spatial decorrelation transform

    Full text link
    Due to the growing needs of human motion capture (mocap) in movie, video games, sports, etc., it is highly desired to compress mocap data for efficient storage and transmission. This paper presents two efficient frameworks for compressing human mocap data with low latency. The first framework processes the data in a frame-by-frame manner so that it is ideal for mocap data streaming and time critical applications. The second one is clip-based and provides a flexible tradeoff between latency and compression performance. Since mocap data exhibits some unique spatial characteristics, we propose a very effective transform, namely learned orthogonal transform (LOT), for reducing the spatial redundancy. The LOT problem is formulated as minimizing square error regularized by orthogonality and sparsity and solved via alternating iteration. We also adopt a predictive coding and temporal DCT for temporal decorrelation in the frame- and clip-based frameworks, respectively. Experimental results show that the proposed frameworks can produce higher compression performance at lower computational cost and latency than the state-of-the-art methods.Comment: 15 pages, 9 figure

    Human Motion Analysis Using Very Few Inertial Measurement Units

    Get PDF
    Realistic character animation and human motion analysis have become major topics of research. In this doctoral research work, three different aspects of human motion analysis and synthesis have been explored. Firstly, on the level of better management of tens of gigabytes of publicly available human motion capture data sets, a relational database approach has been proposed. We show that organizing motion capture data in a relational database provides several benefits such as centralized access to major freely available mocap data sets, fast search and retrieval of data, annotations based retrieval of contents, entertaining data from non-mocap sensor modalities etc. Moreover, the same idea is also proposed for managing quadruped motion capture data. Secondly, a new method of full body human motion reconstruction using very sparse configuration of sensors is proposed. In this setup, two sensor are attached to the upper extremities and one sensor is attached to the lower trunk. The lower trunk sensor is used to estimate ground contacts, which are later used in the reconstruction process along with the low dimensional inputs from the sensors attached to the upper extremities. The reconstruction results of the proposed method have been compared with the reconstruction results of the existing approaches and it has been observed that the proposed method generates lower average reconstruction errors. Thirdly, in the field of human motion analysis, a novel method of estimation of human soft biometrics such as gender, height, and age from the inertial data of a simple human walk is proposed. The proposed method extracts several features from the time and frequency domains for each individual step. A random forest classifier is fed with the extracted features in order to estimate the soft biometrics of a human. The results of classification have shown that it is possible with a higher accuracy to estimate the gender, height, and age of a human from the inertial data of a single step of his/her walk

    Adaptive multi-view feature selection for human motion retrieval

    Get PDF
    Human motion retrieval plays an important role in many motion data based applications. In the past, many researchers tended to use a single type of visual feature as data representation. Because different visual feature describes different aspects about motion data, and they have dissimilar discriminative power with respect to one particular class of human motion, it led to poor retrieval performance. Thus, it would be beneficial to combine multiple visual features together for motion data representation. In this article, we present an Adaptive Multi-view Feature Selection (AMFS) method for human motion retrieval. Specifically, we first use a local linear regression model to automatically learn multiple view-based Laplacian graphs for preserving the local geometric structure of motion data. Then, these graphs are combined together with a non-negative view-weight vector to exploit the complementary information between different features. Finally, in order to discard the redundant and irrelevant feature components from the original high-dimensional feature representation, we formulate the objective function of AMFS as a general trace ratio optimization problem, and design an effective algorithm to solve the corresponding optimization problem. Extensive experiments on two public human motion database, i.e., HDM05 and MSR Action3D, demonstrate the effectiveness of the proposed AMFS over the state-of-art methods for motion data retrieval. The scalability with large motion dataset, and insensitivity with the algorithm parameters, make our method can be widely used in real-world applications

    Human Gait Recognition from Motion Capture Data in Signature Poses

    Get PDF
    Most contribution to the field of structure-based human gait recognition has been done through design of extraordinary gait features. Many research groups that address this topic introduce a unique combination of gait features, select a couple of well-known object classiers, and test some variations of their methods on their custom Kinect databases. For a practical system, it is not necessary to invent an ideal gait feature -- there have been many good geometric features designed -- but to smartly process the data there are at our disposal. This work proposes a gait recognition method without design of novel gait features; instead, we suggest an effective and highly efficient way of processing known types of features. Our method extracts a couple of joint angles from two signature poses within a gait cycle to form a gait pattern descriptor, and classifies the query subject by the baseline 1-NN classier. Not only are these poses distinctive enough, they also rarely accommodate motion irregularities that would result in confusion of identities. We experimentally demonstrate that our gait recognition method outperforms other relevant methods in terms of recognition rate and computational complexity. Evaluations were performed on an experimental database that precisely simulates street-level video surveillance environment
    • …
    corecore