2,418 research outputs found
Controllable Attention for Structured Layered Video Decomposition
The objective of this paper is to be able to separate a video into its
natural layers, and to control which of the separated layers to attend to. For
example, to be able to separate reflections, transparency or object motion. We
make the following three contributions: (i) we introduce a new structured
neural network architecture that explicitly incorporates layers (as spatial
masks) into its design. This improves separation performance over previous
general purpose networks for this task; (ii) we demonstrate that we can augment
the architecture to leverage external cues such as audio for controllability
and to help disambiguation; and (iii) we experimentally demonstrate the
effectiveness of our approach and training procedure with controlled
experiments while also showing that the proposed model can be successfully
applied to real-word applications such as reflection removal and action
recognition in cluttered scenes.Comment: In ICCV 201
Layered Neural Rendering for Retiming People in Video
We present a method for retiming people in an ordinary, natural
video---manipulating and editing the time in which different motions of
individuals in the video occur. We can temporally align different motions,
change the speed of certain actions (speeding up/slowing down, or entirely
"freezing" people), or "erase" selected people from the video altogether. We
achieve these effects computationally via a dedicated learning-based layered
video representation, where each frame in the video is decomposed into separate
RGBA layers, representing the appearance of different people in the video. A
key property of our model is that it not only disentangles the direct motions
of each person in the input video, but also correlates each person
automatically with the scene changes they generate---e.g., shadows,
reflections, and motion of loose clothing. The layers can be individually
retimed and recombined into a new video, allowing us to achieve realistic,
high-quality renderings of retiming effects for real-world videos depicting
complex actions and involving multiple individuals, including dancing,
trampoline jumping, or group running.Comment: To appear in SIGGRAPH Asia 2020. Project webpage:
https://retiming.github.io
Layered Controllable Video Generation
We introduce layered controllable video generation, where we, without any
supervision, decompose the initial frame of a video into foreground and
background layers, with which the user can control the video generation process
by simply manipulating the foreground mask. The key challenges are the
unsupervised foreground-background separation, which is ambiguous, and ability
to anticipate user manipulations with access to only raw video sequences. We
address these challenges by proposing a two-stage learning procedure. In the
first stage, with the rich set of losses and dynamic foreground size prior, we
learn how to separate the frame into foreground and background layers and,
conditioned on these layers, how to generate the next frame using VQ-VAE
generator. In the second stage, we fine-tune this network to anticipate edits
to the mask, by fitting (parameterized) control to the mask from future frame.
We demonstrate the effectiveness of this learning and the more granular control
mechanism, while illustrating state-of-the-art performance on two benchmark
datasets. We provide a video abstract as well as some video results on
https://gabriel-huang.github.io/layered_controllable_video_generationComment: This paper has been accepted to ECCV 2022 as an Oral pape
Learning Foreground-Background Segmentation from Improved Layered GANs
Deep learning approaches heavily rely on high-quality human supervision which
is nonetheless expensive, time-consuming, and error-prone, especially for image
segmentation task. In this paper, we propose a method to automatically
synthesize paired photo-realistic images and segmentation masks for the use of
training a foreground-background segmentation network. In particular, we learn
a generative adversarial network that decomposes an image into foreground and
background layers, and avoid trivial decompositions by maximizing mutual
information between generated images and latent variables. The improved layered
GANs can synthesize higher quality datasets from which segmentation networks of
higher performance can be learned. Moreover, the segmentation networks are
employed to stabilize the training of layered GANs in return, which are further
alternately trained with Layered GANs. Experiments on a variety of
single-object datasets show that our method achieves competitive generation
quality and segmentation performance compared to related methods
Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time
We present a video decomposition method that facilitates layer-based editing
of videos with spatiotemporally varying lighting and motion effects. Our neural
model decomposes an input video into multiple layered representations, each
comprising a 2D texture map, a mask for the original video, and a
multiplicative residual characterizing the spatiotemporal variations in
lighting conditions. A single edit on the texture maps can be propagated to the
corresponding locations in the entire video frames while preserving other
contents' consistencies. Our method efficiently learns the layer-based neural
representations of a 1080p video in 25s per frame via coordinate hashing and
allows real-time rendering of the edited result at 71 fps on a single GPU.
Qualitatively, we run our method on various videos to show its effectiveness in
generating high-quality editing effects. Quantitatively, we propose to adopt
feature-tracking evaluation metrics for objectively assessing the consistency
of video editing. Project page: https://lightbulb12294.github.io/hashing-nvd
Recommended from our members
Electro-spinning/netting: A strategy for the fabrication of three-dimensional polymer nano-fiber/nets.
Since 2006, a rapid development has been achieved in a subject area, so called electro-spinning/netting (ESN), which comprises the conventional electrospinning process and a unique electro-netting process. Electro-netting overcomes the bottleneck problem of electrospinning technique and provides a versatile method for generating spider-web-like nano-nets with ultrafine fiber diameter less than 20 nm. Nano-nets, supported by the conventional electrospun nanofibers in the nano-fiber/nets (NFN) membranes, exhibit numerious attractive characteristics such as extremely small diameter, high porosity, and Steiner tree network geometry, which make NFN membranes optimal candidates for many significant applications. The progress made during the last few years in the field of ESN is highlighted in this review, with particular emphasis on results obtained in the author's research units. After a brief description of the development of the electrospinning and ESN techniques, several fundamental properties of NFN nanomaterials are addressed. Subsequently, the used polymers and the state-of-the-art strategies for the controllable fabrication of NFN membranes are highlighted in terms of the ESN process. Additionally, we highlight some potential applications associated with the remarkable features of NFN nanostructure. Our discussion is concluded with some personal perspectives on the future development in which this wonderful technique could be pursued
State of the Art on Diffusion Models for Visual Computing
The field of visual computing is rapidly advancing due to the emergence of
generative artificial intelligence (AI), which unlocks unprecedented
capabilities for the generation, editing, and reconstruction of images, videos,
and 3D scenes. In these domains, diffusion models are the generative AI
architecture of choice. Within the last year alone, the literature on
diffusion-based tools and applications has seen exponential growth and relevant
papers are published across the computer graphics, computer vision, and AI
communities with new works appearing daily on arXiv. This rapid growth of the
field makes it difficult to keep up with all recent developments. The goal of
this state-of-the-art report (STAR) is to introduce the basic mathematical
concepts of diffusion models, implementation details and design choices of the
popular Stable Diffusion model, as well as overview important aspects of these
generative AI tools, including personalization, conditioning, inversion, among
others. Moreover, we give a comprehensive overview of the rapidly growing
literature on diffusion-based generation and editing, categorized by the type
of generated medium, including 2D images, videos, 3D objects, locomotion, and
4D scenes. Finally, we discuss available datasets, metrics, open challenges,
and social implications. This STAR provides an intuitive starting point to
explore this exciting topic for researchers, artists, and practitioners alike
Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields
Recently, the editing of neural radiance fields (NeRFs) has gained
considerable attention, but most prior works focus on static scenes while
research on the appearance editing of dynamic scenes is relatively lacking. In
this paper, we propose a novel framework to edit the local appearance of
dynamic NeRFs by manipulating pixels in a single frame of training video.
Specifically, to locally edit the appearance of dynamic NeRFs while preserving
unedited regions, we introduce a local surface representation of the edited
region, which can be inserted into and rendered along with the original NeRF
and warped to arbitrary other frames through a learned invertible motion
representation network. By employing our method, users without professional
expertise can easily add desired content to the appearance of a dynamic scene.
We extensively evaluate our approach on various scenes and show that our
approach achieves spatially and temporally consistent editing results. Notably,
our approach is versatile and applicable to different variants of dynamic NeRF
representations.Comment: project page: https://dyn-e.github.io
- …