46 research outputs found

    Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior

    Full text link
    Creating believable motions for various characters has long been a goal in computer graphics. Current learning-based motion synthesis methods depend on extensive motion datasets, which are often challenging, if not impossible, to obtain. On the other hand, pose data is more accessible, since static posed characters are easier to create and can even be extracted from images using recent advancements in computer vision. In this paper, we utilize this alternative data source and introduce a neural motion synthesis approach through retargeting. Our method generates plausible motions for characters that have only pose data by transferring motion from an existing motion capture dataset of another character, which can have drastically different skeletons. Our experiments show that our method effectively combines the motion features of the source character with the pose features of the target character, and performs robustly with small or noisy pose data sets, ranging from a few artist-created poses to noisy poses estimated directly from images. Additionally, a conducted user study indicated that a majority of participants found our retargeted motion to be more enjoyable to watch, more lifelike in appearance, and exhibiting fewer artifacts. Project page: https://cyanzhao42.github.io/pose2motionComment: Project page: https://cyanzhao42.github.io/pose2motio

    Multi-Character Motion Retargeting for Large Scale Changes

    Get PDF

    Interaction-based Human Activity Comparison

    Get PDF
    Traditional methods for motion comparison consider features from individual characters. However, the semantic meaning of many human activities is usually defined by the interaction between them, such as a high-five interaction of two characters. There is little success in adapting interaction-based features in activity comparison, as they either do not have a fixed topology or are in high dimensional. In this paper, we propose a unified framework for activity comparison from the interaction point of view. Our new metric evaluates the similarity of interaction by adapting the Earth Mover’s Distance onto a customized geometric mesh structure that represents spatial-temporal interactions. This allows us to compare different classes of interactions and discover their intrinsic semantic similarity. We created five interaction databases of different natures, covering both two characters (synthetic and real-people) and character-object interactions, which are open for public uses. We demonstrate how the proposed metric aligns well with the semantic meaning of the interaction. We also apply the metric in interaction retrieval and show how it outperforms existing ones. The proposed method can be used for unsupervised activity detection in monitoring systems and activity retrieval in smart animation systems

    A Generative Human-Robot Motion Retargeting Approach Using a Single RGBD Sensor

    Get PDF
    The goal of human-robot motion retargeting is to let a robot follow the movements performed by a human subject. Typically in previous approaches, the human poses are precomputed from a human pose tracking system, after which the explicit joint mapping strategies are specified to apply the estimated poses to a target robot. However, there is not any generic mapping strategy that we can use to map the human joint to robots with different kinds of configurations. In this paper, we present a novel motion retargeting approach that combines the human pose estimation and the motion retargeting procedure in a unified generative framework without relying on any explicit mapping. First, a 3D parametric human-robot (HUMROB) model is proposed which has the specific joint and stability configurations as the target robot while its shape conforms the source human subject. The robot configurations, including its skeleton proportions, joint limitations, and DoFs are enforced in the HUMROB model and get preserved during the tracking procedure. Using a single RGBD camera to monitor human pose, we use the raw RGB and depth sequence as input. The HUMROB model is deformed to fit the input point cloud, from which the joint angle of the model is calculated and applied to the target robots for retargeting. In this way, instead of fitted individually for each joint, we will get the joint angle of the robot fitted globally so that the surface of the deformed model is as consistent as possible to the input point cloud. In the end, no explicit or pre-defined joint mapping strategies are needed. To demonstrate its effectiveness for human-robot motion retargeting, the approach is tested under both simulations and on real robots which have a quite different skeleton configurations and joint degree of freedoms (DoFs) as compared with the source human subjects

    RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset

    Full text link
    Assisting people in efficiently producing visually plausible 3D characters has always been a fundamental research topic in computer vision and computer graphics. Recent learning-based approaches have achieved unprecedented accuracy and efficiency in the area of 3D real human digitization. However, none of the prior works focus on modeling 3D biped cartoon characters, which are also in great demand in gaming and filming. In this paper, we introduce 3DBiCar, the first large-scale dataset of 3D biped cartoon characters, and RaBit, the corresponding parametric model. Our dataset contains 1,500 topologically consistent high-quality 3D textured models which are manually crafted by professional artists. Built upon the data, RaBit is thus designed with a SMPL-like linear blend shape model and a StyleGAN-based neural UV-texture generator, simultaneously expressing the shape, pose, and texture. To demonstrate the practicality of 3DBiCar and RaBit, various applications are conducted, including single-view reconstruction, sketch-based modeling, and 3D cartoon animation. For the single-view reconstruction setting, we find a straightforward global mapping from input images to the output UV-based texture maps tends to lose detailed appearances of some local parts (e.g., nose, ears). Thus, a part-sensitive texture reasoner is adopted to make all important local areas perceived. Experiments further demonstrate the effectiveness of our method both qualitatively and quantitatively. 3DBiCar and RaBit are available at gaplab.cuhk.edu.cn/projects/RaBit.Comment: CVPR 2023, Project page: https://gaplab.cuhk.edu.cn/projects/RaBit

    Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms

    Full text link
    We propose a new model-based algorithm solving the inverse rig problem in facial animation retargeting, exhibiting higher accuracy of the fit and sparser, more interpretable weight vector compared to SOTA. The proposed method targets a specific subdomain of human face animation - highly-realistic blendshape models used in the production of movies and video games. In this paper, we formulate an optimization problem that takes into account all the requirements of targeted models. Our objective goes beyond a linear blendshape model and employs the quadratic corrective terms necessary for correctly fitting fine details of the mesh. We show that the solution to the proposed problem yields highly accurate mesh reconstruction even when general-purpose solvers, like SQP, are used. The results obtained using SQP are highly accurate in the mesh space but do not exhibit favorable qualities in terms of weight sparsity and smoothness, and for this reason, we further propose a novel algorithm relying on a MM technique. The algorithm is specifically suited for solving the proposed objective, yielding a high-accuracy mesh fit while respecting the constraints and producing a sparse and smooth set of weights easy to manipulate and interpret by artists. Our algorithm is benchmarked with SOTA approaches, and shows an overall superiority of the results, yielding a smooth animation reconstruction with a relative improvement up to 45 percent in root mean squared mesh error while keeping the cardinality comparable with benchmark methods. This paper gives a comprehensive set of evaluation metrics that cover different aspects of the solution, including mesh accuracy, sparsity of the weights, and smoothness of the animation curves, as well as the appearance of the produced animation, which human experts evaluated

    Semi-Supervised Facial Animation Retargeting

    Get PDF
    This paper presents a system for facial animation retargeting that al- lows learning a high-quality mapping between motion capture data and arbitrary target characters. We address one of the main chal- lenges of existing example-based retargeting methods, the need for a large number of accurate training examples to define the corre- spondence between source and target expression spaces. We show that this number can be significantly reduced by leveraging the in- formation contained in unlabeled data, i.e. facial expressions in the source or target space without corresponding poses. In contrast to labeled samples that require time-consuming and error-prone manual character posing, unlabeled samples are easily obtained as frames of motion capture recordings or existing animations of the target character. Our system exploits this information by learning a shared latent space between motion capture and character param- eters in a semi-supervised manner. We show that this approach is resilient to noisy input and missing data and significantly improves retargeting accuracy. To demonstrate its applicability, we integrate our algorithm in a performance-driven facial animation system

    UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

    Full text link
    The automatic co-speech gesture generation draws much attention in computer animation. Previous works designed network structures on individual datasets, which resulted in a lack of data volume and generalizability across different motion capture standards. In addition, it is a challenging task due to the weak correlation between speech and gestures. To address these problems, we present UnifiedGesture, a novel diffusion model-based speech-driven gesture synthesis approach, trained on multiple gesture datasets with different skeletons. Specifically, we first present a retargeting network to learn latent homeomorphic graphs for different motion capture standards, unifying the representations of various gestures while extending the dataset. We then capture the correlation between speech and gestures based on a diffusion model architecture using cross-local attention and self-attention to generate better speech-matched and realistic gestures. To further align speech and gesture and increase diversity, we incorporate reinforcement learning on the discrete gesture units with a learned reward function. Extensive experiments show that UnifiedGesture outperforms recent approaches on speech-driven gesture generation in terms of CCA, FGD, and human-likeness. All code, pre-trained models, databases, and demos are available to the public at https://github.com/YoungSeng/UnifiedGesture.Comment: 16 pages, 11 figures, ACM MM 202

    Example-based retargeting of human motion to arbitrary mesh models

    Get PDF
    We present a novel method for retargeting human motion to arbitrary 3D mesh models with as little user interaction as possible. Traditional motion-retargeting systems try to preserve the original motion, while satisfying several motion constraints. Our method uses a few pose-to-pose examples provided by the user to extract the desired semantics behind the retargeting process while not limiting the transfer to being only literal. Thus, mesh models with different structures and/or motion semantics from humanoid skeletons become possible targets. Also considering the fact that most publicly available mesh models lack additional structure (e.g. skeleton), our method dispenses with the need for such a structure by means of a built-in surface-based deformation system. As deformation for animation purposes may require non-rigid behaviour, we augment existing rigid deformation approaches to provide volume-preserving and squash-and-stretch deformations. We demonstrate our approach on well-known mesh models along with several publicly available motion-capture sequences. We present a novel method for retargeting human motion to arbitrary 3D mesh models with as little user interaction as possible. Traditional motion-retargeting systems try to preserve the original motion, while satisfying several motion constraints. Our method uses a few pose-to-pose examples provided by the user to extract the desired semantics behind the retargeting process while not limiting the transfer to being only literal. Thus, mesh models with different structures and/or motion semantics from humanoid skeletons become possible targets. © 2014 The Eurographics Association and John Wiley & Sons Ltd
    corecore