909 research outputs found

    Vision-based 3D Pose Retrieval and Reconstruction

    Get PDF
    The people analysis and the understandings of their motions are the key components in many applications like sports sciences, biomechanics, medical rehabilitation, animated movie productions and the game industry. In this context, retrieval and reconstruction of the articulated 3D human poses are considered as the significant sub-elements. In this dissertation, we address the problem of retrieval and reconstruction of the 3D poses from a monocular video or even from a single RGB image. We propose a few data-driven pipelines to retrieve and reconstruct the 3D poses by exploiting the motion capture data as a prior. The main focus of our proposed approaches is to bridge the gap between the separate media of the 3D marker-based recording and the capturing of motions or photographs using a simple RGB camera. In principal, we leverage both media together efficiently for 3D pose estimation. We have shown that our proposed methodologies need not any synchronized 3D-2D pose-image pairs to retrieve and reconstruct the final 3D poses, and are flexible enough to capture motion in any studio-like indoor environment or outdoor natural environment. In first part of the dissertation, we propose model based approaches for full body human motion reconstruction from the video input by employing just 2D joint positions of the four end effectors and the head. We resolve the 3D-2D pose-image cross model correspondence by developing an intermediate container the knowledge base through the motion capture data which contains information about how people move. It includes the 3D normalized pose space and the corresponding synchronized 2D normalized pose space created by utilizing a number of virtual cameras. We first detect and track the features of these five joints from the input motion sequences using SURF, MSER and colorMSER feature detectors, which vote for the possible 2D locations for these joints in the video. The extraction of suitable feature sets from both, the input control signals and the motion capture data, enables us to retrieve the closest instances from the motion capture dataset through employing the fast searching and retrieval techniques. We develop a graphical structure online lazy neighbourhood graph in order to make the similarity search more accurate and robust by deploying the temporal coherence of the input control signals. The retrieved prior poses are exploited further in order to stabilize the feature detection and tracking process. Finally, the 3D motion sequences are reconstructed by a non-linear optimizer that takes into account multiple energy terms. We evaluate our approaches with a series of experiment scenarios designed in terms of performing actors, camera viewpoints and the noisy inputs. Only a little preprocessing is needed by our methods and the reconstruction processes run close to real time. The second part of the dissertation is dedicated to 3D human pose estimation from a monocular single image. First, we propose an efficient 3D pose retrieval strategy which leads towards a novel data driven approach to reconstruct a 3D human pose from a monocular still image. We design and devise multiple feature sets for global similarity search. At runtime, we search for the similar poses from a motion capture dataset in a definite feature space made up of specific joints. We introduce two-fold method for camera estimation, where we exploit the view directions at which we perform sampling of the MoCap dataset as well as the MoCap priors to minimize the projection error. We also benefit from the MoCap priors and the joints' weights in order to learn a low-dimensional local 3D pose model which is constrained further by multiple energies to infer the final 3D human pose. We thoroughly evaluate our approach on synthetically generated examples, the real internet images and the hand-drawn sketches. We achieve state-of-the-arts results when the test and MoCap data are from the same dataset and obtain competitive results when the motion capture data is taken from a different dataset. Second, we propose a dual source approach for 3D pose estimation from a single RGB image. One major challenge for 3D pose estimation from a single RGB image is the acquisition of sufficient training data. In particular, collecting large amounts of training data that contain unconstrained images and are annotated with accurate 3D poses is infeasible. We therefore propose to use two independent training sources. The first source consists of images with annotated 2D poses and the second source consists of accurate 3D motion capture data. To integrate both sources, we propose a dual-source approach that combines 2D pose estimation with efficient and robust 3D pose retrieval. In our experiments, we show that our approach achieves state-of-the-art results and is even competitive when the skeleton structures of the two sources differ substantially. In the last part of the dissertation, we focus on how the different techniques, developed for the human motion capturing, retrieval and reconstruction can be adapted to handle the quadruped motion capture data and which new applications may appear. We discuss some particularities which must be considered during capturing the large animal motions. For retrieval, we derive the suitable feature sets in order to perform fast searches into the MoCap dataset for similar motion segments. At the end, we present a data-driven approach to reconstruct the quadruped motions from the video input data

    Posing 3D Models from Drawing

    Get PDF
    Inferring the 3D pose of a character from a drawing is a complex and under-constrained problem. Solving it may help automate various parts of an animation production pipeline such as pre-visualisation. In this paper, a novel way of inferring the 3D pose from a monocular 2D sketch is proposed. The proposed method does not make any external assumptions about the model, allowing it to be used on different types of characters. The inference of the 3D pose is formulated as an optimisation problem and a parallel variation of the Particle Swarm Optimisation algorithm called PARAC-LOAPSO is utilised for searching the minimum. Testing in isolation as well as part of a larger scene, the presented method is evaluated by posing a lamp, a horse and a human character. The results show that this method is robust, highly scalable and is able to be extended to various types of models

    Automatic skeletonization and skin attachment for realistic character animation.

    Get PDF
    The realism of character animation is associated with a number of tasks ranging from modelling, skin defonnation, motion generation to rendering. In this research we are concerned with two of them: skeletonization and weight assignment for skin deformation. The fonner is to generate a skeleton, which is placed within the character model and links the motion data to the skin shape of the character. The latter assists the modelling of realistic skin shape when a character is in motion. In the current animation production practice, the task of skeletonization is primarily undertaken by hand, i.e. the animator produces an appropriate skeleton and binds it with the skin model of a character. This is inevitably very time-consuming and costs a lot of labour. In order to improve this issue, in this thesis we present an automatic skeletonization framework. It aims at producing high-quality animatible skeletons without heavy human involvement while allowing the animator to maintain the overall control of the process. In the literature, the tenn skeletonization can have different meanings. Most existing research on skeletonization is in the remit of CAD (Computer Aided Design). Although existing research is of significant reference value to animation, their downside is the skeleton generated is either not appropriate for the particular needs of animation, or the methods are computationally expensive. Although some purpose-build animation skeleton generation techniques exist, unfortunately they rely on complicated post-processing procedures, such as thinning and pruning, which again can be undesirable. The proposed skeletonization framework makes use of a new geometric entity known as the 3D silhouette that is an ordinary silhouette with its depth information recorded. We extract a curve skeleton from two 3D silhouettes of a character detected from its two perpendicular projections. The skeletal joints are identified by down sampling the curve skeleton, leading to the generation of the final animation skeleton. The efficiency and quality are major performance indicators in animation skeleton generation. Our framework achieves the former by providing a 2D solution to the 3D skeletonization problem. Reducing in dimensions brings much faster performances. Experiments and comparisons are carried out to demonstrate the computational simplicity. Its accuracy is also verified via these experiments and comparisons. To link a skeleton to the skin, accordingly we present a skin attachment framework aiming at automatic and reasonable weight distribution. It differs from the conventional algorithms in taking topological information into account during weight computation. An effective range is defined for a joint. Skin vertices located outside the effective range will not be affected by this joint. By this means, we provide a solution to remove the influence of a topologically distant, hence highly likely irrelevant joint on a vertex. A user-defined parameter is also provided in this algorithm, which allows different deformation effects to be obtained according to user's needs. Experiments and comparisons prove that the presented framework results in weight distribution of good quality. Thus it frees animators from tedious manual weight editing. Furthermore, it is flexible to be used with various deformation algorithms

    Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos

    Get PDF
    We propose KeypointGAN, a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses. Video frames differ primarily in the pose of the objects they contain, so our method distils the pose information by analyzing the differences between frames. The distillation uses a new dual representation of the geometry of objects as a set of 2D keypoints, and as a pictorial representation, i.e. a skeleton image. This has three benefits: (1) it provides a tight `geometric bottleneck' which disentangles pose from appearance, (2) it can leverage powerful image-to-image translation networks to map between photometry and geometry, and (3) it allows to incorporate empirical pose priors in the learning process. The pose priors are obtained from unpaired data, such as from a different dataset or modality such as mocap, such that no annotated image is ever used in learning the pose recognition network. In standard benchmarks for pose recognition for humans and faces, our method achieves state-of-the-art performance among methods that do not require any labelled images for training.Comment: CVPR 2020 (oral). Project page: http://www.robots.ox.ac.uk/~vgg/research/unsupervised_pose

    From Dense 2D to Sparse 3D Trajectories for Human Action Detection and Recognition

    Get PDF

    Descriptor Based Analysis of Digital 3D Shapes

    Get PDF
    corecore