11 research outputs found

    3D Human Motion Tracking and Pose Estimation using Probabilistic Activity Models

    Get PDF
    This thesis presents work on generative approaches to human motion tracking and pose estimation where a geometric model of the human body is used for comparison with observations. The existing generative tracking literature can be quite clearly divided between two groups. First, approaches that attempt to solve a difficult high-dimensional inference problem in the body model’s full or ambient pose space, recovering freeform or unknown activity. Second, approaches that restrict inference to a low-dimensional latent embedding of the full pose space, recovering activity for which training data is available or known activity. Significant advances have been made in each of these subgroups. Given sufficiently rich multiocular observations and plentiful computational resources, highdimensional approaches have been proven to track fast and complex unknown activities robustly. Conversely, low-dimensional approaches have been able to support monocular tracking and to significantly reduce computational costs for the recovery of known activity. However, their competing advantages have – although complementary – remained disjoint. The central aim of this thesis is to combine low- and high-dimensional generative tracking techniques to benefit from the best of both approaches. First, a simple generative tracking approach is proposed for tracking known activities in a latent pose space using only monocular or binocular observations. A hidden Markov model (HMM) is used to provide dynamics and constrain a particle-based search for poses. The ability of the HMM to classify as well as synthesise poses means that the approach naturally extends to the modelling of a number of different known activities in a single joint-activity latent space. Second, an additional low-dimensional approach is introduced to permit transitions between segmented known activity training data by allowing particles to move between activity manifolds. Both low-dimensional approaches are then fairly and efficiently combined with a simultaneous high-dimensional generative tracking task in the ambient pose space. This combination allows for the recovery of sequences containing multiple known and unknown human activities at an appropriate (dynamic) computational cost. Finally, a rich hierarchical embedding of the ambient pose space is investigated. This representation allows inference to progress from a single full-body or global non-linear latent pose space, through a number of gradually smaller part-based latent models, to the full ambient pose space. By preserving long-range correlations present in training data, the positions of occluded limbs can be inferred during tracking. Alternatively, by breaking the implied coordination between part-based models novel activity combinations, or composite activity, may be recovered

    Discovery and recognition of motion primitives in human activities

    Get PDF
    We present a novel framework for the automatic discovery and recognition of motion primitives in videos of human activities. Given the 3D pose of a human in a video, human motion primitives are discovered by optimizing the `motion flux', a quantity which captures the motion variation of a group of skeletal joints. A normalization of the primitives is proposed in order to make them invariant with respect to a subject anatomical variations and data sampling rate. The discovered primitives are unknown and unlabeled and are unsupervisedly collected into classes via a hierarchical non-parametric Bayes mixture model. Once classes are determined and labeled they are further analyzed for establishing models for recognizing discovered primitives. Each primitive model is defined by a set of learned parameters. Given new video data and given the estimated pose of the subject appearing on the video, the motion is segmented into primitives, which are recognized with a probability given according to the parameters of the learned models. Using our framework we build a publicly available dataset of human motion primitives, using sequences taken from well-known motion capture datasets. We expect that our framework, by providing an objective way for discovering and categorizing human motion, will be a useful tool in numerous research fields including video analysis, human inspired motion generation, learning by demonstration, intuitive human-robot interaction, and human behavior analysis

    Modeling variation of human motion

    Get PDF
    The synthesis of realistic human motion with large variations and different styles has a growing interest in simulation applications such as the game industry, psychological experiments, and ergonomic analysis. The statistical generative models are used by motion controllers in our motion synthesis framework to create new animations for different scenarios. Data-driven motion synthesis approaches are powerful tools for producing high-fidelity character animations. With the development of motion capture technologies, more and more motion data are publicly available now. However, how to efficiently reuse a large amount of motion data to create new motions for arbitrary scenarios poses challenges, especially for unsupervised motion synthesis. This thesis presents a series of works that analyze and model the variations of human motion data. The goal is to learn statistical generative models to create any number of new human animations with rich variations and styles. The work of the thesis will be presented in three main chapters. We first explore how variation is represented in motion data. Learning a compact latent space that can expressively contain motion variation is essential for modeling motion data. We propose a novel motion latent space learning approach that can intrinsically tackle the spatialtemporal properties of motion data. Secondly, we present our Morphable Graph framework for human motion modeling and synthesis for assembly workshop scenarios. A series of studies have been conducted to apply statistical motion modeling and synthesis approaches for complex assembly workshop use cases. Learning the distribution of motion data can provide a compact representation of motion variations and convert motion synthesis tasks to optimization problems. Finally, we show how the style variations of human activities can be modeled with a limited number of examples. Natural human movements display a rich repertoire of styles and personalities. However, it is difficult to get enough examples for data-driven approaches. We propose a conditional variational autoencoder (CVAE) to combine large variations in the neutral motion database and style information from a limited number of examples.Die Synthese realistischer menschlicher Bewegungen mit großen Variationen und unterschiedlichen Stilen ist für Simulationsanwendungen wie die Spieleindustrie, psychologische Experimente und ergonomische Analysen von wachsendem Interesse. Datengetriebene Bewegungssyntheseansätze sind leistungsstarke Werkzeuge für die Erstellung realitätsgetreuer Charakteranimationen. Mit der Entwicklung von Motion-Capture-Technologien sind nun immer mehr Motion-Daten öffentlich verfügbar. Die effiziente Wiederverwendung einer großen Menge von Motion-Daten zur Erstellung neuer Bewegungen für beliebige Szenarien stellt jedoch eine Herausforderung dar, insbesondere für die unüberwachte Bewegungssynthesemethoden. Das Lernen der Verteilung von Motion-Daten kann eine kompakte Repräsentation von Bewegungsvariationen liefern und Bewegungssyntheseaufgaben in Optimierungsprobleme umwandeln. In dieser Dissertation werden eine Reihe von Arbeiten vorgestellt, die die Variationen menschlicher Bewegungsdaten analysieren und modellieren. Das Ziel ist es, statistische generative Modelle zu erlernen, um eine beliebige Anzahl neuer menschlicher Animationen mit reichen Variationen und Stilen zu erstellen. In unserem Bewegungssynthese-Framework werden die statistischen generativen Modelle von Bewegungscontrollern verwendet, um neue Animationen für verschiedene Szenarien zu erstellen. Die Arbeit in dieser Dissertation wird in drei Hauptkapiteln vorgestellt. Wir untersuchen zunächst, wie Variation in Bewegungsdaten dargestellt wird. Das Erlernen eines kompakten latenten Raums, der Bewegungsvariationen ausdrucksvoll enthalten kann, ist für die Modellierung von Bewegungsdaten unerlässlich. Wir schlagen einen neuartigen Ansatz zum Lernen des latenten Bewegungsraums vor, der die räumlich-zeitlichen Eigenschaften von Bewegungsdaten intrinsisch angehen kann. Zweitens stellen wir unser Morphable Graph Framework für die menschliche Bewegungsmodellierung und -synthese für Montage-Workshop- Szenarien vor. Es wurde eine Reihe von Studien durchgeführt, um statistische Bewegungsmodellierungs und syntheseansätze für komplexe Anwendungsfälle in Montagewerkstätten anzuwenden. Schließlich zeigen wir anhand einer begrenzten Anzahl von Beispielen, wie die Stilvariationen menschlicher Aktivitäten modelliertwerden können. Natürliche menschliche Bewegungen weisen ein reiches Repertoire an Stilen und Persönlichkeiten auf. Es ist jedoch schwierig, genügend Beispiele für datengetriebene Ansätze zu erhalten. Wir schlagen einen Conditional Variational Autoencoder (CVAE) vor, um große Variationen in der neutralen Bewegungsdatenbank und Stilinformationen aus einer begrenzten Anzahl von Beispielen zu kombinieren. Wir zeigen, dass unser Ansatz eine beliebige Anzahl von natürlich aussehenden Variationen menschlicher Bewegungen mit einem ähnlichen Stil wie das Ziel erzeugen kann

    Expressive movement generation with machine learning

    Get PDF
    Movement is an essential aspect of our lives. Not only do we move to interact with our physical environment, but we also express ourselves and communicate with others through our movements. In an increasingly computerized world where various technologies and devices surround us, our movements are essential parts of our interaction with and consumption of computational devices and artifacts. In this context, incorporating an understanding of our movements within the design of the technologies surrounding us can significantly improve our daily experiences. This need has given rise to the field of movement computing – developing computational models of movement that can perceive, manipulate, and generate movements. In this thesis, we contribute to the field of movement computing by building machine-learning-based solutions for automatic movement generation. In particular, we focus on using machine learning techniques and motion capture data to create controllable, generative movement models. We also contribute to the field by creating datasets, tools, and libraries that we have developed during our research. We start our research by reviewing the works on building automatic movement generation systems using machine learning techniques and motion capture data. Our review covers background topics such as high-level movement characterization, training data, features representation, machine learning models, and evaluation methods. Building on our literature review, we present WalkNet, an interactive agent walking movement controller based on neural networks. The expressivity of virtual, animated agents plays an essential role in their believability. Therefore, WalkNet integrates controlling the expressive qualities of movement with the goal-oriented behaviour of an animated virtual agent. It allows us to control the generation based on the valence and arousal levels of affect, the movement’s walking direction, and the mover’s movement signature in real-time. Following WalkNet, we look at controlling movement generation using more complex stimuli such as music represented by audio signals (i.e., non-symbolic music). Music-driven dance generation involves a highly non-linear mapping between temporally dense stimuli (i.e., the audio signal) and movements, which renders a more challenging modelling movement problem. To this end, we present GrooveNet, a real-time machine learning model for music-driven dance generation

    Nonlinear Dimensionality Reduction for Motion Synthesis and Control

    Get PDF
    Synthesising motion of human character animations or humanoid robots is vastly complicated by the large number of degrees of freedom in their kinematics. Control spaces become so large, that automated methods designed to adaptively generate movements become computationally infeasible or fail to find acceptable solutions. In this thesis we investigate how demonstrations of previously successful movements can be used to inform the production of new movements that are adapted to new situations. In particular, we evaluate the use of nonlinear dimensionality reduction techniques to find compact representations of demonstrations, and investigate how these can simplify the synthesis of new movements. Our focus lies on the Gaussian Process Latent Variable Model (GPLVM), because it has proven to capture the nonlinearities present in the kinematics of robots and humans. We present an in-depth analysis of the underlying theory which results in an alternative approach to initialise the GPLVM based on Multidimensional Scaling. We show that the new initialisation is better suited than PCA for nonlinear, synthetic data, but have to note that its advantage shrinks on motion data. Subsequently we show that the incorporation of additional structure constraints leads to low-dimensional representations which are sufficiently regular so that once learned dynamic movement primitives can be adapted to new situations without need for relearning. Finally, we demonstrate in a number of experiments where movements are generated for bimanual reaching, that, through the use of nonlinear dimensionality reduction, reinforcement learning can be scaled up to optimise humanoid movements

    Real-Time Robot Motion Planning Algorithms and Applications Under Uncertainty

    Get PDF
    Robot motion planning is an important problem for real-world robot applications. Recently, the separation of workspaces between humans and robots has been gradually fading, and there is strong interest in developing solutions where collaborative robots (cobots) can interact or work safely with humans in a shared space or in close proximity. When working with humans in real-world environments, the robots need to plan safe motions under uncertainty stemming from many sources such as noise of visual sensors, ambiguity of verbal instruction, and variety of human motions. In this thesis, we propose novel optimization-based and learning-based robot motion planning algorithms to deal with the uncertainties in real-world environments. To handle the input noise of visual cameras and the uncertainty of shape and pose estimation of surrounding objects, we present efficient probabilistic collision detection algorithms for Gaussian and non-Gaussian error distributions. By efficiently computing upper bounds of collision probability between an object and a robot, we present novel trajectory planning algorithms that guarantee that the collision probability at any trajectory point is less than a user-specified threshold. To enable human-robot interaction using natural language instructions, we present a mapping function from grounded linguistic semantics to the coefficients of the motion planning optimization problem. The mapping function considers task descriptions and motion-related constraints. For collaborative robots working with a human in close proximity, we present human intention and motion prediction algorithms for efficient task ordering and safe motion planning. The robot observes the human poses in real-time and predicts the future human motion based on the history of human poses. We also present an occlusion-aware robot motion planning algorithm that accounts for occlusion in the visual sensor data and uses learning-based techniques for trajectory planning. We highlight the benefits of our collision detection and robot motion planning algorithms with a 7-DOF Fetch robot arm in simulated and real-world environments.Doctor of Philosoph

    A Methodology for Extracting Human Bodies from Still Images

    Get PDF
    Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them. One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach

    Discriminative sequence back-constrained GP-LVM for MOCAP based action recognition

    No full text
    In this paper we address the problem of human action recognition within Motion Capture sequences. We introduce a method based on Gaussian Process Latent Variable Models and Alignment Kernels. We build a new discriminative latent variable model with back-constraints induced by the similarity of the original sequences. We compare the proposed method with a standard sequence classification method based on Dynamic Time Warping and with the recently introduced V-GPDS model which is able to model highly dimensional dynamical systems. The proposed methodology exhibits high performance even for datasets that have not been manually preprocessed while it further allows fast inference by exploiting the back constraints
    corecore