585 research outputs found
The Visual Centrifuge: Model-Free Layered Video Representations
True video understanding requires making sense of non-lambertian scenes where
the color of light arriving at the camera sensor encodes information about not
just the last object it collided with, but about multiple mediums -- colored
windows, dirty mirrors, smoke or rain. Layered video representations have the
potential of accurately modelling realistic scenes but have so far required
stringent assumptions on motion, lighting and shape. Here we propose a
learning-based approach for multi-layered video representation: we introduce
novel uncertainty-capturing 3D convolutional architectures and train them to
separate blended videos. We show that these models then generalize to single
videos, where they exhibit interesting abilities: color constancy, factoring
out shadows and separating reflections. We present quantitative and qualitative
results on real world videos.Comment: Appears in: 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2019). This arXiv contains the CVPR Camera Ready version of
the paper (although we have included larger figures) as well as an appendix
detailing the model architectur
An Exploration of Style Transfer Using Deep Neural Networks
Convolutional Neural Networks and Graphics Processing Units have been at the core of a paradigm shift in computer vision research that some researchers have called `\u27the algorithmic perception revolution. This thesis presents the implementation and analysis of several techniques for performing artistic style transfer using a Convolutional Neural Network architecture trained for large-scale image recognition tasks. We present an implementation of an existing algorithm for artistic style transfer in images and video. The neural algorithm separates and recombines the style and content of arbitrary images. Additionally, we present an extension of the algorithm to perform weighted artistic style transfer
LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction
As demands for high-quality videos continue to rise, high-resolution and
high-dynamic range (HDR) imaging techniques are drawing attention. To generate
an HDR video from low dynamic range (LDR) images, one of the critical steps is
the motion compensation between LDR frames, for which most existing works
employed the optical flow algorithm. However, these methods suffer from flow
estimation errors when saturation or complicated motions exist. In this paper,
we propose an end-to-end HDR video composition framework, which aligns LDR
frames in the feature space and then merges aligned features into an HDR frame,
without relying on pixel-domain optical flow. Specifically, we propose a
luminance-based alignment network for HDR (LAN-HDR) consisting of an alignment
module and a hallucination module. The alignment module aligns a frame to the
adjacent reference by evaluating luminance-based attention, excluding color
information. The hallucination module generates sharp details, especially for
washed-out areas due to saturation. The aligned and hallucinated features are
then blended adaptively to complement each other. Finally, we merge the
features to generate a final HDR frame. In training, we adopt a temporal loss,
in addition to frame reconstruction losses, to enhance temporal consistency and
thus reduce flickering. Extensive experiments demonstrate that our method
performs better or comparable to state-of-the-art methods on several
benchmarks.Comment: ICCV 202
Blind Video Deflickering by Neural Filtering with a Flawed Atlas
Many videos contain flickering artifacts. Common causes of flicker include
video processing algorithms, video generation algorithms, and capturing videos
under specific situations. Prior work usually requires specific guidance such
as the flickering frequency, manual annotations, or extra consistent videos to
remove the flicker. In this work, we propose a general flicker removal
framework that only receives a single flickering video as input without
additional guidance. Since it is blind to a specific flickering type or
guidance, we name this "blind deflickering." The core of our approach is
utilizing the neural atlas in cooperation with a neural filtering strategy. The
neural atlas is a unified representation for all frames in a video that
provides temporal consistency guidance but is flawed in many cases. To this
end, a neural network is trained to mimic a filter to learn the consistent
features (e.g., color, brightness) and avoid introducing the artifacts in the
atlas. To validate our method, we construct a dataset that contains diverse
real-world flickering videos. Extensive experiments show that our method
achieves satisfying deflickering performance and even outperforms baselines
that use extra guidance on a public benchmark.Comment: To appear in CVPR2023. Code:
github.com/ChenyangLEI/All-In-One-Deflicker Website:
chenyanglei.github.io/deflicke
NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads
We focus on reconstructing high-fidelity radiance fields of human heads,
capturing their animations over time, and synthesizing re-renderings from novel
viewpoints at arbitrary time steps. To this end, we propose a new multi-view
capture setup composed of 16 calibrated machine vision cameras that record
time-synchronized images at 7.1 MP resolution and 73 frames per second. With
our setup, we collect a new dataset of over 4700 high-resolution,
high-framerate sequences of more than 220 human heads, from which we introduce
a new human head reconstruction benchmark. The recorded sequences cover a wide
range of facial dynamics, including head motions, natural expressions,
emotions, and spoken language. In order to reconstruct high-fidelity human
heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles
(NeRSemble). We represent scene dynamics by combining a deformation field and
an ensemble of 3D multi-resolution hash encodings. The deformation field allows
for precise modeling of simple scene movements, while the ensemble of hash
encodings helps to represent complex dynamics. As a result, we obtain radiance
field representations of human heads that capture motion over time and
facilitate re-rendering of arbitrary novel viewpoints. In a series of
experiments, we explore the design choices of our method and demonstrate that
our approach outperforms state-of-the-art dynamic radiance field approaches by
a significant margin.Comment: Siggraph 2023, Project Page:
https://tobias-kirschstein.github.io/nersemble/ , Video:
https://youtu.be/a-OAWqBzld
- …