18,310 research outputs found
Generative Single Image Reflection Separation
Single image reflection separation is an ill-posed problem since two scenes,
a transmitted scene and a reflected scene, need to be inferred from a single
observation. To make the problem tractable, in this work we assume that
categories of two scenes are known. It allows us to address the problem by
generating both scenes that belong to the categories while their contents are
constrained to match with the observed image. A novel network architecture is
proposed to render realistic images of both scenes based on adversarial
learning. The network can be trained in a weakly supervised manner, i.e., it
learns to separate an observed image without corresponding ground truth images
of transmission and reflection scenes which are difficult to collect in
practice. Experimental results on real and synthetic datasets demonstrate that
the proposed algorithm performs favorably against existing methods
Towards Unsupervised Single-Channel Blind Source Separation using Adversarial Pair Unmix-and-Remix
Blind single-channel source separation is a long standing signal processing
challenge. Many methods were proposed to solve this task utilizing multiple
signal priors such as low rank, sparsity, temporal continuity etc. The recent
advance of generative adversarial models presented new opportunities in signal
regression tasks. The power of adversarial training however has not yet been
realized for blind source separation tasks. In this work, we propose a novel
method for blind source separation (BSS) using adversarial methods. We rely on
the independence of sources for creating adversarial constraints on pairs of
approximately separated sources, which ensure good separation. Experiments are
carried out on image sources validating the good performance of our approach,
and presenting our method as a promising approach for solving BSS for general
signals.Comment: ICASSP'1
Face Image Reflection Removal
Face images captured through the glass are usually contaminated by
reflections. The non-transmitted reflections make the reflection removal more
challenging than for general scenes, because important facial features are
completely occluded. In this paper, we propose and solve the face image
reflection removal problem. We remove non-transmitted reflections by
incorporating inpainting ideas into a guided reflection removal framework and
recover facial features by considering various face-specific priors. We use a
newly collected face reflection image dataset to train our model and compare
with state-of-the-art methods. The proposed method shows advantages in
estimating reflection-free face images for improving face recognition
Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements
Removing undesirable reflections from a single image captured through a glass
window is of practical importance to visual computing systems. Although
state-of-the-art methods can obtain decent results in certain situations,
performance declines significantly when tackling more general real-world cases.
These failures stem from the intrinsic difficulty of single image reflection
removal -- the fundamental ill-posedness of the problem, and the insufficiency
of densely-labeled training data needed for resolving this ambiguity within
learning-based neural network pipelines. In this paper, we address these issues
by exploiting targeted network enhancements and the novel use of misaligned
data. For the former, we augment a baseline network architecture by embedding
context encoding modules that are capable of leveraging high-level contextual
clues to reduce indeterminacy within areas containing strong reflections. For
the latter, we introduce an alignment-invariant loss function that facilitates
exploiting misaligned real-world training data that is much easier to collect.
Experimental results collectively show that our method outperforms the
state-of-the-art with aligned data, and that significant improvements are
possible when using additional misaligned data.Comment: Accepted to CVPR2019; code is available at
https://github.com/Vandermode/ERRNe
Attentive Generative Adversarial Network for Raindrop Removal from a Single Image
Raindrops adhered to a glass window or camera lens can severely hamper the
visibility of a background scene and degrade an image considerably. In this
paper, we address the problem by visually removing raindrops, and thus
transforming a raindrop degraded image into a clean one. The problem is
intractable, since first the regions occluded by raindrops are not given.
Second, the information about the background scene of the occluded regions is
completely lost for most part. To resolve the problem, we apply an attentive
generative network using adversarial training. Our main idea is to inject
visual attention into both the generative and discriminative networks. During
the training, our visual attention learns about raindrop regions and their
surroundings. Hence, by injecting this information, the generative network will
pay more attention to the raindrop regions and the surrounding structures, and
the discriminative network will be able to assess the local consistency of the
restored regions. This injection of visual attention to both generative and
discriminative networks is the main contribution of this paper. Our experiments
show the effectiveness of our approach, which outperforms the state of the art
methods quantitatively and qualitatively.Comment: CVPR2018 Spotligh
Composite Shape Modeling via Latent Space Factorization
We present a novel neural network architecture, termed Decomposer-Composer,
for semantic structure-aware 3D shape modeling. Our method utilizes an
auto-encoder-based pipeline, and produces a novel factorized shape embedding
space, where the semantic structure of the shape collection translates into a
data-dependent sub-space factorization, and where shape composition and
decomposition become simple linear operations on the embedding coordinates. We
further propose to model shape assembly using an explicit learned part
deformation module, which utilizes a 3D spatial transformer network to perform
an in-network volumetric grid deformation, and which allows us to train the
whole system end-to-end. The resulting network allows us to perform part-level
shape manipulation, unattainable by existing approaches. Our extensive ablation
study, comparison to baseline methods and qualitative analysis demonstrate the
improved performance of the proposed method
The Visual Centrifuge: Model-Free Layered Video Representations
True video understanding requires making sense of non-lambertian scenes where
the color of light arriving at the camera sensor encodes information about not
just the last object it collided with, but about multiple mediums -- colored
windows, dirty mirrors, smoke or rain. Layered video representations have the
potential of accurately modelling realistic scenes but have so far required
stringent assumptions on motion, lighting and shape. Here we propose a
learning-based approach for multi-layered video representation: we introduce
novel uncertainty-capturing 3D convolutional architectures and train them to
separate blended videos. We show that these models then generalize to single
videos, where they exhibit interesting abilities: color constancy, factoring
out shadows and separating reflections. We present quantitative and qualitative
results on real world videos.Comment: Appears in: 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2019). This arXiv contains the CVPR Camera Ready version of
the paper (although we have included larger figures) as well as an appendix
detailing the model architectur
Machine learning in acoustics: theory and applications
Acoustic data provide scientific and engineering insights in fields ranging
from biology and communications to ocean and Earth science. We survey the
recent advances and transformative potential of machine learning (ML),
including deep learning, in the field of acoustics. ML is a broad family of
techniques, which are often based in statistics, for automatically detecting
and utilizing patterns in data. Relative to conventional acoustics and signal
processing, ML is data-driven. Given sufficient training data, ML can discover
complex relationships between features and desired labels or actions, or
between features themselves. With large volumes of training data, ML can
discover models describing complex acoustic phenomena such as human speech and
reverberation. ML in acoustics is rapidly developing with compelling results
and significant future promise. We first introduce ML, then highlight ML
developments in four acoustics research areas: source localization in speech
processing, source localization in ocean acoustics, bioacoustics, and
environmental sounds in everyday scenes.Comment: Published with free access in Journal of the Acoustical Society of
America, 27 Nov. 201
GazeGAN - Unpaired Adversarial Image Generation for Gaze Estimation
Recent research has demonstrated the ability to estimate gaze on mobile
devices by performing inference on the image from the phone's front-facing
camera, and without requiring specialized hardware. While this offers wide
potential applications such as in human-computer interaction, medical diagnosis
and accessibility (e.g., hands free gaze as input for patients with motor
disorders), current methods are limited as they rely on collecting data from
real users, which is a tedious and expensive process that is hard to scale
across devices. There have been some attempts to synthesize eye region data
using 3D models that can simulate various head poses and camera settings,
however these lack in realism.
In this paper, we improve upon a recently suggested method, and propose a
generative adversarial framework to generate a large dataset of high resolution
colorful images with high diversity (e.g., in subjects, head pose, camera
settings) and realism, while simultaneously preserving the accuracy of gaze
labels. The proposed approach operates on extended regions of the eye, and even
completes missing parts of the image. Using this rich synthesized dataset, and
without using any additional training data from real users, we demonstrate
improvements over state-of-the-art for estimating 2D gaze position on mobile
devices. We further demonstrate cross-device generalization of model
performance, as well as improved robustness to diverse head pose, blur and
distance.Comment: Project was done when the first author was at Google Researc
Adversarial Audio Synthesis
Audio signals are sampled at high temporal resolutions, and learning to
synthesize audio requires capturing structure across a range of timescales.
Generative adversarial networks (GANs) have seen wide success at generating
images that are both locally and globally coherent, but they have seen little
application to audio generation. In this paper we introduce WaveGAN, a first
attempt at applying GANs to unsupervised synthesis of raw-waveform audio.
WaveGAN is capable of synthesizing one second slices of audio waveforms with
global coherence, suitable for sound effect generation. Our experiments
demonstrate that, without labels, WaveGAN learns to produce intelligible words
when trained on a small-vocabulary speech dataset, and can also synthesize
audio from other domains such as drums, bird vocalizations, and piano. We
compare WaveGAN to a method which applies GANs designed for image generation on
image-like audio feature representations, finding both approaches to be
promising.Comment: Published as a conference paper at ICLR 201
- …