78,342 research outputs found

    Intelligent visual media processing: when graphics meets vision

    Get PDF
    The computer graphics and computer vision communities have been working closely together in recent years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media around us. There are three major driving forces behind this phenomenon: i) the availability of big data from the Internet has created a demand for dealing with the ever increasing, vast amount of resources; ii) powerful processing tools, such as deep neural networks, provide e�ective ways for learning how to deal with heterogeneous visual data; iii) new data capture devices, such as the Kinect, bridge between algorithms for 2D image understanding and 3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey recent research on how computer vision techniques bene�t computer graphics techniques and vice versa, and cover research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest possible further research directions

    Physics-Based Modeling of Nonrigid Objects for Vision and Graphics (Dissertation)

    Get PDF
    This thesis develops a physics-based framework for 3D shape and nonrigid motion modeling for computer vision and computer graphics. In computer vision it addresses the problems of complex 3D shape representation, shape reconstruction, quantitative model extraction from biomedical data for analysis and visualization, shape estimation, and motion tracking. In computer graphics it demonstrates the generative power of our framework to synthesize constrained shapes, nonrigid object motions and object interactions for the purposes of computer animation. Our framework is based on the use of a new class of dynamically deformable primitives which allow the combination of global and local deformations. It incorporates physical constraints to compose articulated models from deformable primitives and provides force-based techniques for fitting such models to sparse, noise-corrupted 2D and 3D visual data. The framework leads to shape and nonrigid motion estimators that exploit dynamically deformable models to track moving 3D objects from time-varying observations. We develop models with global deformation parameters which represent the salient shape features of natural parts, and local deformation parameters which capture shape details. In the context of computer graphics, these models represent the physics-based marriage of the parameterized and free-form modeling paradigms. An important benefit of their global/local descriptive power in the context of computer vision is that it can potentially satisfy the often conflicting requirements of shape reconstruction and shape recognition. The Lagrange equations of motion that govern our models, augmented by constraints, make them responsive to externally applied forces derived from input data or applied by the user. This system of differential equations is discretized using finite element methods and simulated through time using standard numerical techniques. We employ these equations to formulate a shape and nonrigid motion estimator. The estimator is a continuous extended Kalman filter that recursively transforms the discrepancy between the sensory data and the estimated model state into generalized forces. These adjust the translational, rotational, and deformational degrees of freedom such that the model evolves in a consistent fashion with the noisy data. We demonstrate the interactive time performance of our techniques in a series of experiments in computer vision, graphics, and visualization

    Unsupervised Texture Transfer from Images to Model Collections

    Get PDF
    Large 3D model repositories of common objects are now ubiquitous and are increasingly being used in computer graphics and computer vision for both analysis and synthesis tasks. However, images of objects in the real world have a richness of appearance that these repositories do not capture, largely because most existing 3D models are untextured. In this work we develop an automated pipeline capable of transporting texture information from images of real objects to 3D models of similar objects. This is a challenging problem, as an object's texture as seen in a photograph is distorted by many factors, including pose, geometry, and illumination. These geometric and photometric distortions must be undone in order to transfer the pure underlying texture to a new object --- the 3D model. Instead of using problematic dense correspondences, we factorize the problem into the reconstruction of a set of base textures (materials) and an illumination model for the object in the image. By exploiting the geometry of the similar 3D model, we reconstruct certain reliable texture regions and correct for the illumination, from which a full texture map can be recovered and applied to the model. Our method allows for large-scale unsupervised production of richly textured 3D models directly from image data, providing high quality virtual objects for 3D scene design or photo editing applications, as well as a wealth of data for training machine learning algorithms for various inference tasks in graphics and vision

    Generating 3D faces using Convolutional Mesh Autoencoders

    Full text link
    Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformations and non-linear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution. Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects. Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters. We also show that, replacing the expression space of an existing state-of-the-art face model with our autoencoder, achieves a lower reconstruction error. Our data, model and code are available at http://github.com/anuragranj/com

    {HiFECap}: {M}onocular High-Fidelity and Expressive Capture of Human Performances

    Get PDF
    Monocular 3D human performance capture is indispensable for many applicationsin computer graphics and vision for enabling immersive experiences. However,detailed capture of humans requires tracking of multiple aspects, including theskeletal pose, the dynamic surface, which includes clothing, hand gestures aswell as facial expressions. No existing monocular method allows joint trackingof all these components. To this end, we propose HiFECap, a new neural humanperformance capture approach, which simultaneously captures human pose,clothing, facial expression, and hands just from a single RGB video. Wedemonstrate that our proposed network architecture, the carefully designedtraining strategy, and the tight integration of parametric face and hand modelsto a template mesh enable the capture of all these individual aspects.Importantly, our method also captures high-frequency details, such as deformingwrinkles on the clothes, better than the previous works. Furthermore, we showthat HiFECap outperforms the state-of-the-art human performance captureapproaches qualitatively and quantitatively while for the first time capturingall aspects of the human.<br

    Motion capture based on RGBD data from multiple sensors for avatar animation

    Get PDF
    With recent advances in technology and emergence of affordable RGB-D sensors for a wider range of users, markerless motion capture has become an active field of research both in computer vision and computer graphics. In this thesis, we designed a POC (Proof of Concept) for a new tool that enables us to perform motion capture by using a variable number of commodity RGB-D sensors of different brands and technical specifications on constraint-less layout environments. The main goal of this work is to provide a tool with motion capture capabilities by using a handful of RGB-D sensors, without imposing strong requirements in terms of lighting, background or extension of the motion capture area. Of course, the number of RGB-D sensors needed is inversely proportional to their resolution, and directly proportional to the size of the area to track to. Built on top of the OpenNI 2 library, we made this POC compatible with most of the nonhigh-end RGB-D sensors currently available in the market. Due to the lack of resources on a single computer, in order to support more than a couple of sensors working simultaneously, we need a setup composed of multiple computers. In order to keep data coherency and synchronization across sensors and computers, our tool makes use of a semi-automatic calibration method and a message-oriented network protocol. From color and depth data given by a sensor, we can also obtain a 3D pointcloud representation of the environment. By combining pointclouds from multiple sensors, we can collect a complete and animated 3D pointcloud that can be visualized from any viewpoint. Given a 3D avatar model and its corresponding attached skeleton, we can use an iterative optimization method (e.g. Simplex) to find a fit between each pointcloud frame and a skeleton configuration, resulting in 3D avatar animation when using such skeleton configurations as key frames
    corecore