2,282 research outputs found

    Learning to Synthesize a 4D RGBD Light Field from a Single Image

    Full text link
    We present a machine learning algorithm that takes as input a 2D RGB image and synthesizes a 4D RGBD light field (color and depth of the scene in each ray direction). For training, we introduce the largest public light field dataset, consisting of over 3300 plenoptic camera light fields of scenes containing flowers and plants. Our synthesis pipeline consists of a convolutional neural network (CNN) that estimates scene geometry, a stage that renders a Lambertian light field using that geometry, and a second CNN that predicts occluded rays and non-Lambertian effects. Our algorithm builds on recent view synthesis methods, but is unique in predicting RGBD for each light field ray and improving unsupervised single image depth estimation by enforcing consistency of ray depths that should intersect the same scene point. Please see our supplementary video at https://youtu.be/yLCvWoQLnmsComment: International Conference on Computer Vision (ICCV) 201

    Visualization in spatial modeling

    Get PDF
    This chapter deals with issues arising from a central theme in contemporary computer modeling - visualization. We first tie visualization to varieties of modeling along the continuum from iconic to symbolic and then focus on the notion that our models are so intrinsically complex that there are many different types of visualization that might be developed in their understanding and implementation. This focuses the debate on the very way of 'doing science' in that patterns and processes of any complexity can be better understood through visualizing the data, the simulations, and the outcomes that such models generate. As we have grown more sensitive to the problem of complexity in all systems, we are more aware that the twin goals of parsimony and verifiability which have dominated scientific theory since the 'Enlightenment' are up for grabs: good theories and models must 'look right' despite what our statistics and causal logics tell us. Visualization is the cutting edge of this new way of thinking about science but its styles vary enormously with context. Here we define three varieties: visualization of complicated systems to make things simple or at least explicable, which is the role of pedagogy; visualization to explore unanticipated outcomes and to refine processes that interact in unanticipated ways; and visualization to enable end users with no prior understanding of the science but a deep understanding of the problem to engage in using models for prediction, prescription, and control. We illustrate these themes with a model of an agricultural market which is the basis of modern urban economics - the von Thünen model of land rent and density; a model of urban development based on interacting spatial and temporal processes of land development - the DUEM model; and a pedestrian model of human movement at the fine scale where control of such movements to meet standards of public safety is intrinsically part of the model about which the controllers know intimately. © Springer-Verlag Berlin Heidelberg 2006

    Temporal Interpolation of Dynamic Digital Humans using Convolutional Neural Networks

    Get PDF
    In recent years, there has been an increased interest in point cloud representation for visualizing digital humans in cross reality. However, due to their voluminous size, point clouds require high bandwidth to be transmitted. In this paper, we propose a temporal interpolation architecture capable of increasing the temporal resolution of dynamic digital humans, represented using point clouds. With this technique, bandwidth savings can be achieved by transmitting dynamic point clouds in a lower temporal resolution, and recreating a higher temporal resolution on the receiving side. Our interpolation architecture works by first downsampling the point clouds to a lower spatial resolution, then estimating scene flow using a newly designed neural network architecture, and finally upsampling the result back to the original spatial resolution. To improve the smoothness of the results, we additionally apply a novel technique called neighbour snapping. To be able to train and test our newly designed network, we created a synthetic point cloud data set of animated human bodies. Results from the evaluation of our architecture through a small-scale user study show the benefits of our method with respect to the state of the art in scene flow estimation for point clouds. Moreover, correlation between our user study and existing objective quality metrics confirm the need for new metrics to accurately predict the visual quality of point cloud contents

    Neural Voice Puppetry: Audio-driven Facial Reenactment

    Get PDF
    We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis. Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with the audio of the source input. This audio-driven facial reenactment is driven by a deep neural network that employs a latent 3D face model space. Through the underlying 3D representation, the model inherently learns temporal stability while we leverage neural rendering to generate photo-realistic output frames. Our approach generalizes across different people, allowing us to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches. Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head. We demonstrate the capabilities of our method in a series of audio- and text-based puppetry examples. Our method is not only more general than existing works since we are generic to the input person, but we also show superior visual and lip sync quality compared to photo-realistic audio- and video-driven reenactment techniques

    A framework for natural animation of digitized models

    No full text
    We present a novel versatile, fast and simple framework to generate highquality animations of scanned human characters from input motion data. Our method is purely mesh-based and, in contrast to skeleton-based animation, requires only a minimum of manual interaction. The only manual step that is required to create moving virtual people is the placement of a sparse set of correspondences between triangles of an input mesh and triangles of the mesh to be animated. The proposed algorithm implicitly generates realistic body deformations, and can easily transfer motions between human erent shape and proportions. erent types of input data, e.g. other animated meshes and motion capture les, in just the same way. Finally, and most importantly, it creates animations at interactive frame rates. We feature two working prototype systems that demonstrate that our method can generate lifelike character animations from both marker-based and marker-less optical motion capture data

    Quantitative assessment of intrinsic noise for visually guided behaviour in zebrafish

    Get PDF
    Supported by Royal Society of London (University Research Fellowship), Medical Research Council (New Investigator Research Grant) and CNRS.Peer reviewedPostprin

    PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception

    Full text link
    The ability to perceive and reason about social interactions in the context of physical environments is core to human social intelligence and human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception of complex social interactions that go beyond short actions, such as high-fiving, or simple group activities, such as gathering. In this work, we create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions by including social concepts such as helping another agent. PHASE consists of 2D animations of pairs of agents moving in a continuous space generated procedurally using a physics engine and a hierarchical planner. Agents have a limited field of view, and can interact with multiple objects, in an environment that has multiple landmarks and obstacles. Using PHASE, we design a social recognition task and a social prediction task. PHASE is validated with human experiments demonstrating that humans perceive rich interactions in the social events, and that the simulated agents behave similarly to humans. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE (SIMulation, Planning and Local Estimation), which outperforms state-of-the-art feed-forward neural networks. We hope that PHASE can serve as a difficult new challenge for developing new models that can recognize complex social interactions.Comment: The first two authors contributed equally; AAAI 2021; 13 pages, 7 figures; Project page: https://www.tshu.io/PHAS
    • …
    corecore