11 research outputs found
Layered Neural Rendering for Retiming People in Video
We present a method for retiming people in an ordinary, natural
video---manipulating and editing the time in which different motions of
individuals in the video occur. We can temporally align different motions,
change the speed of certain actions (speeding up/slowing down, or entirely
"freezing" people), or "erase" selected people from the video altogether. We
achieve these effects computationally via a dedicated learning-based layered
video representation, where each frame in the video is decomposed into separate
RGBA layers, representing the appearance of different people in the video. A
key property of our model is that it not only disentangles the direct motions
of each person in the input video, but also correlates each person
automatically with the scene changes they generate---e.g., shadows,
reflections, and motion of loose clothing. The layers can be individually
retimed and recombined into a new video, allowing us to achieve realistic,
high-quality renderings of retiming effects for real-world videos depicting
complex actions and involving multiple individuals, including dancing,
trampoline jumping, or group running.Comment: To appear in SIGGRAPH Asia 2020. Project webpage:
https://retiming.github.io
Recommended from our members
Models of Visual Appearance for Analyzing and Editing Images and Videos
The visual appearance of an image is a complex function of factors such as scene geometry, material reflectances and textures, illumination, and the properties of the camera used to capture the image. Understanding how these factors interact to produce an image is a fundamental problem in computer vision and graphics. This dissertation examines two aspects of this problem: models of visual appearance that allow us to recover scene properties from images and videos, and tools that allow users to manipulate visual appearance in images and videos in intuitive ways. In particular, we look at these problems in three different applications. First, we propose techniques for compositing images that differ significantly in their appearance. Our framework transfers appearance between images by manipulating the different levels of a multi-scale decomposition of the image. This allows users to create realistic composites with minimal interaction in a number of different scenarios. We also discuss techniques for compositing and replacing facial performances in videos. Second, we look at the problem of creating high-quality still images from low-quality video clips. Traditional multi-image enhancement techniques accomplish this by inverting the cameraâs imaging process. Our system incorporates feature weights into these image models to create results that have better resolution, noise, and blur characteristics, and summarize the activity in the video. Finally, we analyze variations in scene appearance caused by changes in lighting. We develop a model for outdoor scene appearance that allows us to recover radiometric and geometric infor- mation about the scene from images. We apply this model to a variety of visual tasks, including color-constancy, background subtraction, shadow detection, scene reconstruction, and camera geo-location. We also show that the appearance of a Lambertian scene can be modeled as a combi- nation of distinct three-dimensional illumination subspaces â a result that leads to novel bounds on scene appearance, and a robust uncalibrated photometric stereo method.Engineering and Applied Science
Recommended from our members
LEARNING TO RIG CHARACTERS
With the emergence of 3D virtual worlds, 3D social media, and massive online games, the need for diverse, high-quality, animation-ready characters and avatars is greater than ever. To animate characters, artists hand-craft articulation structures, such as animation skeletons and part deformers, which require significant amount of manual and laborious interaction with 2D/3D modeling interfaces. This thesis presents deep learning methods that are able to significantly automate the process of character rigging.
First, the thesis introduces RigNet, a method capable of predicting an animation skeleton for an input static 3D shape in the form of a polygon mesh. The predicted skeletons match the animator expectations in joint placement and topology. RigNet also estimates surface skin weights which determine how the mesh is animated given the different skeletal poses. In contrast to prior work that fits pre-defined skeletal templates with hand-tuned objectives, RigNet is able to automatically rig diverse characters, such as humanoids, quadrupeds, toys, birds, with varying articulation structure and geometry. RigNet is based on a deep neural architecture that directly operates on the mesh representation. The architecture is trained on a diverse dataset of rigged models that we mined online and curated. The dataset includes 2.7K polygon meshes, along with their associated skeletons and corresponding skin weights.
Second, the thesis introduces Morig, a method that automatically rigs character meshes driven by single-view point cloud streams capturing the motion of performing characters. Compared to RigNet, MoRig\u27s rigging is \emph{motion-aware}: its neural network encodes motion cues from the point clouds into compact feature representations that are informative about the articulated parts of the performing character. These motion-aware features guide the inference of an appropriate skeletal rig for the input mesh. Furthermore, Morig is able to animate the rig according to the captured point cloud motion. Morig can handle diverse characters with different morphologies (e.g., humanoids, quadrupeds, toy characters). It also accounts for occluded regions in the point clouds and mismatches in the part proportions between the input mesh and captured character.
Third, the thesis introduces APES, a method that takes as input 2D raster images depicting a small set of poses of a character shown in a sprite sheet, and identifies articulated parts useful for rigging the character. APES uses a combination of neural network inference and integer linear programming to identify a compact set of articulated body parts, e.g. head, torso and limbs, that best reconstruct the input poses. Compared to Morig and RigNet that require a large collection of training models with associated skeletons and skinning weights, APES\u27 neural architecture relies on less effortful supervision from (i) pixel correspondences readily available in existing large cartoon image datasets (e.g., Creative Flow), (ii) a relatively small dataset of 57 cartoon characters segmented into moving parts.
Finally, the thesis discusses future research directions related to combining neural rigging with 3D and 4D reconstruction of characters from point cloud data and 2D video as well as automating the process of motion synthesis for 3D characters
Dementia in Parkinsonâs Disease
An estimated 50% to 80% of individuals with Parkinsonâs disease experience Parkinsonâs disease dementia (PDD). Based on the prevalence and clinical complexity of PDD, this book provides an in-depth update on topics including epidemiology, diagnosis, and treatment. Chapters discuss non-medical therapies and examine views on end-of-life issues as well. This book is a must-read for anyone interested in PDD whether they are a patient, caregiver, or doctor
Water rights and related water supply issues
Presented during the USCID water management conference held on October 13-16, 2004 in Salt Lake City, Utah. The theme of the conference was "Water rights and related water supply issues."Includes bibliographical references.Proceedings sponsored by the U.S. Department of the Interior, Central Utah Project Completion Act Office and the U.S. Committee on Irrigation and Drainage.Consensus building as a primary tool to resolve water supply conflicts -- Administration to Colorado River allocations: the Law of the River and the Colorado River Water Delivery Agreement of 2003 -- Irrigation management in Afghanistan: the tradition of Mirabs -- Institutional reforms in irrigation sector of Pakistan: an approach towards integrated water resource management -- On-line and real-time water right allocation in Utah's Sevier River basin -- Improving equity of water distribution: the challenge for farmer organizations in Sindh, Pakistan -- Impacts from transboundary water rights violations in South Asia -- Impacts of water conservation and Endangered Species Act on large water project planning, Utah Lake Drainage Basin Water Delivery System, Bonneville Unit of the Central Utah Project -- Economic importance and environmental challenges of the Awash River basin to Ethiopia -- Accomplishing the impossible: overcoming obstacles of a combined irrigation project -- Estimating actual evapotranspiration without land use classification -- Improving water management in irrigated agricultue -- Beneficial uses of treated drainage water -- Comparative assessment of risk mitigation options for irrigated agricutlrue -- A multi-variable approach for the command of Canal de Provence Aix Nord Water Supply Subsystem -- Hierarchical Bayesian Analysis and Statistical Learning Theory II: water management application -- Soil moisture data collection and water supply forecasting -- Development and implementation of a farm water conservation program within the Coachella Valley Water District, California -- Concepts of ground water recharge and well augmentation in northeastern Colorado -- Water banking in Colorado: an experiment in trouble? -- Estimating conservable water in the Klamath Irrigation Project -- Socio-economic impacts of land retirement in Westlands Water District -- EPDM rubber lining system chosen to save valuable irrigation water -- A user-centered approach to develop decision support systems for estimating pumping and augmentation needs in Colorado's South Platte basin -- Utah's Tri-County Automation Project -- Using HEC-RAS to model canal systems -- Potential water and energy conservation and improved flexibility for water users in the Oasis area of the Coachella Valley Water District, California
Analysing British sign language through the lens of systemic functional linguistics
Approaches to understanding language via Systemic Functional Linguistics (SFL) have resulted in a compendium of literature focussing on language as a âsocial semiotic.â One such area of this literature comprises systemic functional grammars: descriptions of various languages and the way in which they create meaning. Despite the application of SFL to numerous languages and the creation of systemic functional grammars, a common thread is that of modality: SFL has been applied to numerous languages in the spoken and written modalities, but not in any detail to languages in the visual-spatial modality.My thesis presents an initial attempt at analysing British Sign Language (BSL) through the systemic functional lens. Calling on various theories and methods found in sign linguistics and SFL, I perform an analysis on a sample of BSL clauses (N = 1,375) from three perspectives: how BSL manages exchanges of communication (the interpersonal metafunction); how BSL encodes aspects of experience and reality (the experiential metafunction); and how BSL may be organised to produce a coherent text with variance in information prominence (the textual metafunction). As a result, I present three sets of system networks based on these three metafunctions, complete with realisation statements and examples.This thesis provides considerable impact. From an academic perspective, this is the first in-depth systemic functional description of a language in the visual-spatial modality, providing insight both into how such languages function, and how analyses of these languages may feed back into those of spoken and written languages. From a social perspective, the BSL system networks can assist language learners of any level as a point of reference in clause construction. Furthermore, intermediate and higher BSL qualifications stipulate knowledge of sign linguistics as a required component, yet these assessments are based on resources that have not been updated in nearly twenty years. As such, the products of this thesis may go towards informing future BSL assessments
Recommended from our members
Federal Register
Daily publication of the U.S. Office of the Federal Register contains rules and regulations, proposed legislation and rule changes, and other notices, including "Presidential proclamations and Executive Orders, Federal agency documents having general applicability and legal effect, documents required to be published by act of Congress, and other Federal agency documents of public interest" (p. ii). Table of Contents starts on page iii
Renewable resources in the Pacific : proceedings of the 12th Pacific Trade and Development Conference, held in Vancouver, Canada, 7-11 Sept. 1981
Meeting: Pacific Trade and Development Conference, 12th, 7-11 Sept. 1981, Vancouver, B.C., C
Recommended from our members
1995 BRAC Commission
Data Call - Naval Command, Control and Ocean Surveillance Center, Research Development Test Evaluation Division - San Diego, CA. Data Call #5. Box 186, L-118
The grammar of immersion: a social semiotic study of nonfiction cinematic virtual reality
Cinematic virtual reality (CVR) is an audio-visual form viewed in a virtual reality headset. Its
novelty lies in the way it immerses its audience in highly realistic 360° visual representations.
Being camera-based, CVR facilitates many of the practices of conventional filmmaking but
fundamentally alters them through its lack of a rectangular frame. As such, CVR has garnered
scholarly attention as a âframelessâ storytelling medium yet to develop its own language. The form
has gained traction with producers of nonfiction who recognize CVRâs capacity to transport
audiences to remote social worlds, leading to claims that equate CVRâs immersion with a social
and emotional response to its filmed subjects. A strand of CVR scholarship has emerged,
grounding nonfiction CVR theoretically and critiquing such deterministic claims. Broadly
speaking, these parallel strands of inquiry point to a common concern with CVRâs semiotics; as
the meaning potential of the 360° format, and the social aspects of its use in documenting reality.
Currently however, there appears to be a lack of systematic analyses that foreground CVRâs
semiotics.
This study addresses this gap by using social semiotic methods to complement these threads of
inquiry, subsuming them into a holistic account of CVRâs semantics. Utilizing systemic functional
methods, multimodal discourse analyses were performed on nonfiction CVR texts addressing
core research objectives. The first objective is the systematic description of CVR as a semiotic
technology, and the configuring of discourse through its novel 360° modality. The CVR spectator
is described for their role in the real-time construction of low-level meanings. Higher-level
concepts further characterize CVR texts as technologically enabled, virtual sites of social
discourse. The second research objective concerns clarifying the implications of CVR for
nonfiction practitioners. Nonfiction discourse is conceptualized as the negotiation of semiotic
autonomy, independence, and control, between viewing spectator, filmed subject, and CVR author
respectively. The third objective concerns the development of an analytical approach tailored
specifically for CVR. Extant systems from image, text, film, and action analyses are reflexively
applied, appraised, and adapted for use in the study of CVR and new frames are presented to cater
for the 360° modality.
The findings show CVR to be an inherently logical, contextualizing form, where the spectator has
a degree of sense-making autonomy in the construction of representational and social meanings.
This semantic autonomy is found to camouflage the deeper textual constructions in what appear
as âreality experiencesâ. The repercussions for the CVR producer are the indeterminacy of
meanings which are âat riskâ in particular ways when conventional framing methods cannot be
utilized, and when the spectator is given reflexive agency to make meaningful connections across
the 360° image. Systemic functional analytical methods prove flexible enough to be applied to the
texts, and open enough for the study to present additional systems and frames for a more fulsome
approach to the analysis of CVR