1,426 research outputs found
Deep Structured Energy-Based Image Inpainting
In this paper, we propose a structured image inpainting method employing an
energy based model. In order to learn structural relationship between patterns
observed in images and missing regions of the images, we employ an energy-based
structured prediction method. The structural relationship is learned by
minimizing an energy function which is defined by a simple convolutional neural
network. The experimental results on various benchmark datasets show that our
proposed method significantly outperforms the state-of-the-art methods which
use Generative Adversarial Networks (GANs). We obtained 497.35 mean squared
error (MSE) on the Olivetti face dataset compared to 833.0 MSE provided by the
state-of-the-art method. Moreover, we obtained 28.4 dB peak signal to noise
ratio (PSNR) on the SVHN dataset and 23.53 dB on the CelebA dataset, compared
to 22.3 dB and 21.3 dB, provided by the state-of-the-art methods, respectively.
The code is publicly available.Comment: Accepted to 24th International Conference on Pattern Recognition
(ICPR 2018). 6 pages, 7 figure
Deep Convolutional Framelets: A General Deep Learning Framework for Inverse Problems
Recently, deep learning approaches with various network architectures have
achieved significant performance improvement over existing iterative
reconstruction methods in various imaging problems. However, it is still
unclear why these deep learning architectures work for specific inverse
problems. To address these issues, here we show that the long-searched-for
missing link is the convolution framelets for representing a signal by
convolving local and non-local bases. The convolution framelets was originally
developed to generalize the theory of low-rank Hankel matrix approaches for
inverse problems, and this paper further extends the idea so that we can obtain
a deep neural network using multilayer convolution framelets with perfect
reconstruction (PR) under rectilinear linear unit nonlinearity (ReLU). Our
analysis also shows that the popular deep network components such as residual
block, redundant filter channels, and concatenated ReLU (CReLU) do indeed help
to achieve the PR, while the pooling and unpooling layers should be augmented
with high-pass branches to meet the PR condition. Moreover, by changing the
number of filter channels and bias, we can control the shrinkage behaviors of
the neural network. This discovery leads us to propose a novel theory for deep
convolutional framelets neural network. Using numerical experiments with
various inverse problems, we demonstrated that our deep convolution framelets
network shows consistent improvement over existing deep architectures.This
discovery suggests that the success of deep learning is not from a magical
power of a black-box, but rather comes from the power of a novel signal
representation using non-local basis combined with data-driven local basis,
which is indeed a natural extension of classical signal processing theory.Comment: This will appear in SIAM Journal on Imaging Science
Data recovery in computational fluid dynamics through deep image priors
One of the challenges encountered by computational simulations at exascale is
the reliability of simulations in the face of hardware and software faults.
These faults, expected to increase with the complexity of the computational
systems, will lead to the loss of simulation data and simulation failure and
are currently addressed through a checkpoint-restart paradigm. Focusing
specifically on computational fluid dynamics simulations, this work proposes a
method that uses a deep convolutional neural network to recover simulation
data. This data recovery method (i) is agnostic to the flow configuration and
geometry, (ii) does not require extensive training data, and (iii) is accurate
for very different physical flows. Results indicate that the use of deep image
priors for data recovery is more accurate than standard recovery techniques,
such as the Gaussian process regression, also known as Kriging. Data recovery
is performed for two canonical fluid flows: laminar flow around a cylinder and
homogeneous isotropic turbulence. For data recovery of the laminar flow around
a cylinder, results indicate similar performance between the proposed method
and Gaussian process regression across a wide range of mask sizes. For
homogeneous isotropic turbulence, data recovery through the deep convolutional
neural network exhibits an error in relevant turbulent quantities approximately
three times smaller than that for the Gaussian process regression,. Forward
simulations using recovered data illustrate that the enstrophy decay is
captured within 10% using the deep convolutional neural network approach.
Although demonstrated specifically for data recovery of fluid flows, this
technique can be used in a wide range of applications, including particle image
velocimetry, visualization, and computational simulations of physical processes
beyond the Navier-Stokes equations
Learning Energy Based Inpainting for Optical Flow
Modern optical flow methods are often composed of a cascade of many
independent steps or formulated as a black box neural network that is hard to
interpret and analyze. In this work we seek for a plain, interpretable, but
learnable solution. We propose a novel inpainting based algorithm that
approaches the problem in three steps: feature selection and matching,
selection of supporting points and energy based inpainting. To facilitate the
inference we propose an optimization layer that allows to backpropagate through
10K iterations of a first-order method without any numerical or memory
problems. Compared to recent state-of-the-art networks, our modular CNN is very
lightweight and competitive with other, more involved, inpainting based
methods
Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions
We apply the spike-and-slab Restricted Boltzmann Machine (ssRBM) to texture
modeling. The ssRBM with tiled-convolution weight sharing (TssRBM) achieves or
surpasses the state-of-the-art on texture synthesis and inpainting by
parametric models. We also develop a novel RBM model with a spike-and-slab
visible layer and binary variables in the hidden layer. This model is designed
to be stacked on top of the TssRBM. We show the resulting deep belief network
(DBN) is a powerful generative model that improves on single-layer models and
is capable of modeling not only single high-resolution and challenging textures
but also multiple textures
A context encoder for audio inpainting
We study the ability of deep neural networks (DNNs) to restore missing audio
content based on its context, i.e., inpaint audio gaps. We focus on a condition
which has not received much attention yet: gaps in the range of tens of
milliseconds. We propose a DNN structure that is provided with the signal
surrounding the gap in the form of time-frequency (TF) coefficients. Two DNNs
with either complex-valued TF coefficient output or magnitude TF coefficient
output were studied by separately training them on inpainting two types of
audio signals (music and musical instruments) having 64-ms long gaps. The
magnitude DNN outperformed the complex-valued DNN in terms of signal-to-noise
ratios and objective difference grades. Although, for instruments, a reference
inpainting obtained through linear predictive coding performed better in both
metrics, it performed worse than the magnitude DNN for music. This demonstrates
the potential of the magnitude DNN, in particular for inpainting signals that
are more complex than single instrument sounds.Comment: Published in IEEE TASL
Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet
Video sequences contain rich dynamic patterns, such as dynamic texture
patterns that exhibit stationarity in the temporal domain, and action patterns
that are non-stationary in either spatial or temporal domain. We show that a
spatial-temporal generative ConvNet can be used to model and synthesize dynamic
patterns. The model defines a probability distribution on the video sequence,
and the log probability is defined by a spatial-temporal ConvNet that consists
of multiple layers of spatial-temporal filters to capture spatial-temporal
patterns of different scales. The model can be learned from the training video
sequences by an "analysis by synthesis" learning algorithm that iterates the
following two steps. Step 1 synthesizes video sequences from the currently
learned model. Step 2 then updates the model parameters based on the difference
between the synthesized video sequences and the observed training sequences. We
show that the learning algorithm can synthesize realistic dynamic patterns
Detecting Anomalous Faces with 'No Peeking' Autoencoders
Detecting anomalous faces has important applications. For example, a system
might tell when a train driver is incapacitated by a medical event, and assist
in adopting a safe recovery strategy. These applications are demanding, because
they require accurate detection of rare anomalies that may be seen only at
runtime. Such a setting causes supervised methods to perform poorly. We
describe a method for detecting an anomalous face image that meets these
requirements. We construct a feature vector that reliably has large entries for
anomalous images, then use various simple unsupervised methods to score the
image based on the feature. Obvious constructions (autoencoder codes;
autoencoder residuals) are defeated by a 'peeking' behavior in autoencoders.
Our feature construction removes rectangular patches from the image, predicts
the likely content of the patch conditioned on the rest of the image using a
specially trained autoencoder, then compares the result to the image. High
scores suggest that the patch was difficult for an autoencoder to predict, and
so is likely anomalous. We demonstrate that our method can identify real
anomalous face images in pools of typical images, taken from celeb-A, that is
much larger than usual in state-of-the-art experiments. A control experiment
based on our method with another set of normal celebrity images - a 'typical
set', but nonceleb-A are not identified as anomalous; confirms this is not due
to special properties of celeb-A
Incorporating long-range consistency in CNN-based texture generation
Gatys et al. (2015) showed that pair-wise products of features in a
convolutional network are a very effective representation of image textures. We
propose a simple modification to that representation which makes it possible to
incorporate long-range structure into image generation, and to render images
that satisfy various symmetry constraints. We show how this can greatly improve
rendering of regular textures and of images that contain other kinds of
symmetric structure. We also present applications to inpainting and season
transfer
Texture Modelling with Nested High-order Markov-Gibbs Random Fields
Currently, Markov-Gibbs random field (MGRF) image models which include
high-order interactions are almost always built by modelling responses of a
stack of local linear filters. Actual interaction structure is specified
implicitly by the filter coefficients. In contrast, we learn an explicit
high-order MGRF structure by considering the learning process in terms of
general exponential family distributions nested over base models, so that
potentials added later can build on previous ones. We relatively rapidly add
new features by skipping over the costly optimisation of parameters.
We introduce the use of local binary patterns as features in MGRF texture
models, and generalise them by learning offsets to the surrounding pixels.
These prove effective as high-order features, and are fast to compute. Several
schemes for selecting high-order features by composition or search of a small
subclass are compared. Additionally we present a simple modification of the
maximum likelihood as a texture modelling-specific objective function which
aims to improve generalisation by local windowing of statistics.
The proposed method was experimentally evaluated by learning high-order MGRF
models for a broad selection of complex textures and then performing texture
synthesis, and succeeded on much of the continuum from stochastic through
irregularly structured to near-regular textures. Learning interaction structure
is very beneficial for textures with large-scale structure, although those with
complex irregular structure still provide difficulties. The texture models were
also quantitatively evaluated on two tasks and found to be competitive with
other works: grading of synthesised textures by a panel of observers; and
comparison against several recent MGRF models by evaluation on a constrained
inpainting task.Comment: Submitted to Computer Vision and Image Understandin
- …