3 research outputs found
Controllable Attention for Structured Layered Video Decomposition
The objective of this paper is to be able to separate a video into its
natural layers, and to control which of the separated layers to attend to. For
example, to be able to separate reflections, transparency or object motion. We
make the following three contributions: (i) we introduce a new structured
neural network architecture that explicitly incorporates layers (as spatial
masks) into its design. This improves separation performance over previous
general purpose networks for this task; (ii) we demonstrate that we can augment
the architecture to leverage external cues such as audio for controllability
and to help disambiguation; and (iii) we experimentally demonstrate the
effectiveness of our approach and training procedure with controlled
experiments while also showing that the proposed model can be successfully
applied to real-word applications such as reflection removal and action
recognition in cluttered scenes.Comment: In ICCV 201
Layered Neural Rendering for Retiming People in Video
We present a method for retiming people in an ordinary, natural
video---manipulating and editing the time in which different motions of
individuals in the video occur. We can temporally align different motions,
change the speed of certain actions (speeding up/slowing down, or entirely
"freezing" people), or "erase" selected people from the video altogether. We
achieve these effects computationally via a dedicated learning-based layered
video representation, where each frame in the video is decomposed into separate
RGBA layers, representing the appearance of different people in the video. A
key property of our model is that it not only disentangles the direct motions
of each person in the input video, but also correlates each person
automatically with the scene changes they generate---e.g., shadows,
reflections, and motion of loose clothing. The layers can be individually
retimed and recombined into a new video, allowing us to achieve realistic,
high-quality renderings of retiming effects for real-world videos depicting
complex actions and involving multiple individuals, including dancing,
trampoline jumping, or group running.Comment: To appear in SIGGRAPH Asia 2020. Project webpage:
https://retiming.github.io
Controllable attention for structured layered video decomposition
The objective of this paper is to be able to separate a video into its natural layers, and to control which of the separated layers to attend to. For example, to be able to separate reflections, transparency or object motion. We make the following three contributions: (i) we introduce a new structured neural network architecture that explicitly incorporates layers (as spatial masks) into its design. This improves separation performance over previous general purpose networks for this task; (ii) we demonstrate that we can augment the architecture to leverage external cues such as audio for controllability and to help disambiguation; and (iii) we experimentally demonstrate the effectiveness of our approach and training procedure with controlled experiments while also showing that the proposed model can be successfully applied to real-word applications such as reflection removal and action recognition in cluttered scenes