412 research outputs found

    적은 수의 사용자 입력으로부터 인간 동작의 합성 및 편집

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 이제희.An ideal 3D character animation system can easily synthesize and edit human motion and also will provide an efficient user interface for an animator. However, despite advancements of animation systems, building effective systems for synthesizing and editing realistic human motion still remains a difficult problem. In the case of a single character, the human body is a significantly complex structure because it consists of as many as hundreds of degrees of freedom. An animator should manually adjust many joints of the human body from user inputs. In a crowd scene, many individuals in a human crowd have to respond to user inputs when an animator wants a given crowd to fit a new environment. The main goal of this thesis is to improve interactions between a user and an animation system. As 3D character animation systems are usually driven by low-dimensional inputs, there is no method for a user to directly generate a high-dimensional character animation. To address this problem, we propose a data-driven mapping model that is built by motion data obtained from a full-body motion capture system, crowd simulation, and data-driven motion synthesis algorithm. With the data-driven mapping model in hand, we can transform low-dimensional user inputs into character animation because motion data help to infer missing parts of system inputs. As motion capture data have many details and provide realism of the movement of a human, it is easier to generate a realistic character animation than without motion capture data. To demonstrate the generality and strengths of our approach, we developed two animation systems that allow the user to synthesize a single character animation in realtime and edit crowd animation via low-dimensional user inputs interactively. The first system entails controlling a virtual avatar using a small set of three-dimensional (3D) motion sensors. The second system manipulates large-scale crowd animation that consists of hundreds of characters with a small number of user constraints. Examples show that our system is much less laborious and time-consuming than previous animation systems, and thus is much more suitable for a user to generate desired character animation.Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Background 10 2.1 Performance Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 Performance-based Interfaces for Character Animation . . . . . . . 11 2.1.2 Statistical Models for Motion Synthesis . . . . . . . . . . . . . . . 12 2.1.3 Retrieval of Motion Capture Data . . . . . . . . . . . . . . . . . . 13 2.2 Crowd Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Crowd Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Motion Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.3 Geometry Deformation . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Realtime Performance Animation Using Sparse 3D Motion Sensors 17 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Sensor Data and Calibration . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Motion Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.1 Online Local Model . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.2 Kernel CCA-based Regression . . . . . . . . . . . . . . . . . . . . 25 3.4.3 Motion Post-processing . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4 Interactive Manipulation of Large-Scale Crowd Animation 40 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Crowd Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Cage-based Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3.1 Cage Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.2 Cage Representation . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4 Editing Crowd Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.1 Spatial Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.2 Temporal Manipulation . . . . . . . . . . . . . . . . . . . . . . . . 57 4.5 Collision Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 Conclusion 69 Bibliography I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIIIDocto

    Robust correlated and individual component analysis

    Get PDF
    © 1979-2012 IEEE.Recovering correlated and individual components of two, possibly temporally misaligned, sets of data is a fundamental task in disciplines such as image, vision, and behavior computing, with application to problems such as multi-modal fusion (via correlated components), predictive analysis, and clustering (via the individual ones). Here, we study the extraction of correlated and individual components under real-world conditions, namely i) the presence of gross non-Gaussian noise and ii) temporally misaligned data. In this light, we propose a method for the Robust Correlated and Individual Component Analysis (RCICA) of two sets of data in the presence of gross, sparse errors. We furthermore extend RCICA in order to handle temporal incongruities arising in the data. To this end, two suitable optimization problems are solved. The generality of the proposed methods is demonstrated by applying them onto 4 applications, namely i) heterogeneous face recognition, ii) multi-modal feature fusion for human behavior analysis (i.e., audio-visual prediction of interest and conflict), iii) face clustering, and iv) thetemporal alignment of facial expressions. Experimental results on 2 synthetic and 7 real world datasets indicate the robustness and effectiveness of the proposed methodson these application domains, outperforming other state-of-the-art methods in the field

    Single View Reconstruction for Human Face and Motion with Priors

    Get PDF
    Single view reconstruction is fundamentally an under-constrained problem. We aim to develop new approaches to model human face and motion with model priors that restrict the space of possible solutions. First, we develop a novel approach to recover the 3D shape from a single view image under challenging conditions, such as large variations in illumination and pose. The problem is addressed by employing the techniques of non-linear manifold embedding and alignment. Specifically, the local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction technique, and the correspondences between these local models are then learned by a manifold alignment method. Local models successfully remove the dependency of large training databases for human face modeling. By combining the local shapes, the global shape of a face can be reconstructed directly from a single linear system of equations via least square. Unfortunately, this learning-based approach cannot be successfully applied to the problem of human motion modeling due to the internal and external variations in single view video-based marker-less motion capture. Therefore, we introduce a new model-based approach for capturing human motion using a stream of depth images from a single depth sensor. While a depth sensor provides metric 3D information, using a single sensor, instead of a camera array, results in a view-dependent and incomplete measurement of object motion. We develop a novel two-stage template fitting algorithm that is invariant to subject size and view-point variations, and robust to occlusions. Starting from a known pose, our algorithm first estimates a body configuration through temporal registration, which is used to search the template motion database for a best match. The best match body configuration as well as its corresponding surface mesh model are deformed to fit the input depth map, filling in the part that is occluded from the input and compensating for differences in pose and body-size between the input image and the template. Our approach does not require any makers, user-interaction, or appearance-based tracking. Experiments show that our approaches can achieve good modeling results for human face and motion, and are capable of dealing with variety of challenges in single view reconstruction, e.g., occlusion

    Robust correlated and individual component analysis

    Get PDF
    Recovering correlated and individual components of two, possibly temporally misaligned, sets of data is a fundamental task in disciplines such as image, vision, and behavior computing, with application to problems such as multi-modal fusion (via correlated components), predictive analysis, and clustering (via the individual ones). Here, we study the extraction of correlated and individual components under real-world conditions, namely i) the presence of gross non-Gaussian noise and ii) temporally misaligned data. In this light, we propose a method for the Robust Correlated and Individual Component Analysis (RCICA) of two sets of data in the presence of gross, sparse errors. We furthermore extend RCICA in order to handle temporal incongruities arising in the data. To this end, two suitable optimization problems are solved. The generality of the proposed methods is demonstrated by applying them onto 4 applications, namely i) heterogeneous face recognition, ii) multi-modal feature fusion for human behavior analysis (i.e., audio-visual prediction of interest and conflict), iii) face clustering, and iv) the temporal alignment of facial expressions. Experimental results on 2 synthetic and 7 real world datasets indicate the robustness and effectiveness of the proposed methods on these application domains, outperforming other state-of-the-art methods in the field

    Human Motion Analysis for Efficient Action Recognition

    Get PDF
    Automatic understanding of human actions is at the core of several application domains, such as content-based indexing, human-computer interaction, surveillance, and sports video analysis. The recent advances in digital platforms and the exponential growth of video and image data have brought an urgent quest for intelligent frameworks to automatically analyze human motion and predict their corresponding action based on visual data and sensor signals. This thesis presents a collection of methods that targets human action recognition using different action modalities. The first method uses the appearance modality and classifies human actions based on heterogeneous global- and local-based features of scene and humanbody appearances. The second method harnesses 2D and 3D articulated human poses and analyizes the body motion using a discriminative combination of the parts’ velocities, locations, and correlations histograms for action recognition. The third method presents an optimal scheme for combining the probabilistic predictions from different action modalities by solving a constrained quadratic optimization problem. In addition to the action classification task, we present a study that compares the utility of different pose variants in motion analysis for human action recognition. In particular, we compare the recognition performance when 2D and 3D poses are used. Finally, we demonstrate the efficiency of our pose-based method for action recognition in spotting and segmenting motion gestures in real time from a continuous stream of an input video for the recognition of the Italian sign gesture language

    Human Pose Estimation from Monocular Images : a Comprehensive Survey

    Get PDF
    Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problema into several modules: feature extraction and description, human body models, and modelin methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used

    A System for 3D Shape Estimation and Texture Extraction via Structured Light

    Get PDF
    Shape estimation is a crucial problem in the fields of computer vision, robotics and engineering. This thesis explores a shape from structured light (SFSL) approach using a pyramidal laser projector, and the application of texture extraction. The specific SFSL system is chosen for its hardware simplicity, and efficient software. The shape estimation system is capable of estimating the 3D shape of both static and dynamic objects by relying on a fixed pattern. In order to eliminate the need for precision hardware alignment and to remove human error, novel calibration schemes were developed. In addition, selecting appropriate system geometry reduces the typical correspondence problem to that of a labeling problem. Simulations and experiments verify the effectiveness of the built system. Finally, we perform texture extraction by interpolating and resampling sparse range estimates, and subsequently flattening the 3D triangulated graph into a 2D triangulated graph via graph and manifold methods

    Robust arbitrary-view gait recognition based on 3D partial similarity matching

    Get PDF
    Existing view-invariant gait recognition methods encounter difficulties due to limited number of available gait views and varying conditions during training. This paper proposes gait partial similarity matching that assumes a 3-dimensional (3D) object shares common view surfaces in significantly different views. Detecting such surfaces aids the extraction of gait features from multiple views. 3D parametric body models are morphed by pose and shape deformation from a template model using 2-dimensional (2D) gait silhouette as observation. The gait pose is estimated by a level set energy cost function from silhouettes including incomplete ones. Body shape deformation is achieved via Laplacian deformation energy function associated with inpainting gait silhouettes. Partial gait silhouettes in different views are extracted by gait partial region of interest elements selection and re-projected onto 2D space to construct partial gait energy images. A synthetic database with destination views and multi-linear subspace classifier fused with majority voting are used to achieve arbitrary view gait recognition that is robust to varying conditions. Experimental results on CMU, CASIA B, TUM-IITKGP, AVAMVG and KY4D datasets show the efficacy of the propose method
    corecore