18,417 research outputs found
Image Inpainting using Block-wise Procedural Training with Annealed Adversarial Counterpart
Recent advances in deep generative models have shown promising potential in
image inpanting, which refers to the task of predicting missing pixel values of
an incomplete image using the known context. However, existing methods can be
slow or generate unsatisfying results with easily detectable flaws. In
addition, there is often perceivable discontinuity near the holes and require
further post-processing to blend the results. We present a new approach to
address the difficulty of training a very deep generative model to synthesize
high-quality photo-realistic inpainting. Our model uses conditional generative
adversarial networks (conditional GANs) as the backbone, and we introduce a
novel block-wise procedural training scheme to stabilize the training while we
increase the network depth. We also propose a new strategy called adversarial
loss annealing to reduce the artifacts. We further describe several losses
specifically designed for inpainting and show their effectiveness. Extensive
experiments and user-study show that our approach outperforms existing methods
in several tasks such as inpainting, face completion and image harmonization.
Finally, we show our framework can be easily used as a tool for interactive
guided inpainting, demonstrating its practical value to solve common real-world
challenges
Graphical Representation for Heterogeneous Face Recognition
Heterogeneous face recognition (HFR) refers to matching face images acquired
from different sources (i.e., different sensors or different wavelengths) for
identification. HFR plays an important role in both biometrics research and
industry. In spite of promising progresses achieved in recent years, HFR is
still a challenging problem due to the difficulty to represent two
heterogeneous images in a homogeneous manner. Existing HFR methods either
represent an image ignoring the spatial information, or rely on a
transformation procedure which complicates the recognition task. Considering
these problems, we propose a novel graphical representation based HFR method
(G-HFR) in this paper. Markov networks are employed to represent heterogeneous
image patches separately, which takes the spatial compatibility between
neighboring image patches into consideration. A coupled representation
similarity metric (CRSM) is designed to measure the similarity between obtained
graphical representations. Extensive experiments conducted on multiple HFR
scenarios (viewed sketch, forensic sketch, near infrared image, and thermal
infrared image) show that the proposed method outperforms state-of-the-art
methods.Comment: 13 pages, 10 figures, TPAMI 2016 accepte
Class-specific Poisson denoising by patch-based importance sampling
In this paper, we address the problem of recovering images degraded by
Poisson noise, where the image is known to belong to a specific class. In the
proposed method, a dataset of clean patches from images of the class of
interest is clustered using multivariate Gaussian distributions. In order to
recover the noisy image, each noisy patch is assigned to one of these
distributions, and the corresponding minimum mean squared error (MMSE) estimate
is obtained. We propose to use a self-normalized importance sampling approach,
which is a method of the Monte-Carlo family, for the both determining the most
likely distribution and approximating the MMSE estimate of the clean patch.
Experimental results shows that our proposed method outperforms other methods
for Poisson denoising at a low SNR regime
Single Frame Image super Resolution using Learned Directionlets
In this paper, a new directionally adaptive, learning based, single image
super resolution method using multiple direction wavelet transform, called
Directionlets is presented. This method uses directionlets to effectively
capture directional features and to extract edge information along different
directions of a set of available high resolution images .This information is
used as the training set for super resolving a low resolution input image and
the Directionlet coefficients at finer scales of its high-resolution image are
learned locally from this training set and the inverse Directionlet transform
recovers the super-resolved high resolution image. The simulation results
showed that the proposed approach outperforms standard interpolation techniques
like Cubic spline interpolation as well as standard Wavelet-based learning,
both visually and in terms of the mean squared error (mse) values. This method
gives good result with aliased images also.Comment: 14 pages,6 figure
Real-time, long-term hand tracking with unsupervised initialization
This paper proposes a complete tracking system that is capable of long-term, real-time hand tracking with unsupervised initialization and error recovery. Initialization is steered by a three-stage hand detector, combining spatial and temporal information. Hand hypotheses are generated by a random forest detector in the first stage, whereas a simple linear classifier eliminates false positive detections. Resulting detections are tracked by particle filters that gather temporal statistics in order to make a final decision. The detector is scale and rotation invariant, and can detect hands in any pose in unconstrained environments. The resulting discriminative confidence map is combined with a generative particle filter based observation model to enable robust, long-term hand tracking in real-time. The proposed solution is evaluated using several challenging, publicly available datasets, and is shown to clearly outperform other state of the art object tracking methods
TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes
We introduce, TextureNet, a neural network architecture designed to extract
features from high-resolution signals associated with 3D surface meshes (e.g.,
color texture maps). The key idea is to utilize a 4-rotational symmetric
(4-RoSy) field to define a domain for convolution on a surface. Though 4-RoSy
fields have several properties favorable for convolution on surfaces (low
distortion, few singularities, consistent parameterization, etc.), orientations
are ambiguous up to 4-fold rotation at any sample point. So, we introduce a new
convolutional operator invariant to the 4-RoSy ambiguity and use it in a
network to extract features from high-resolution signals on geodesic
neighborhoods of a surface. In comparison to alternatives, such as PointNet
based methods which lack a notion of orientation, the coherent structure given
by these neighborhoods results in significantly stronger features. As an
example application, we demonstrate the benefits of our architecture for 3D
semantic segmentation of textured 3D meshes. The results show that our method
outperforms all existing methods on the basis of mean IoU by a significant
margin in both geometry-only (6.4%) and RGB+Geometry (6.9-8.2%) settings
Face Recognition using Optimal Representation Ensemble
Recently, the face recognizers based on linear representations have been
shown to deliver state-of-the-art performance. In real-world applications,
however, face images usually suffer from expressions, disguises and random
occlusions. The problematic facial parts undermine the validity of the
linear-subspace assumption and thus the recognition performance deteriorates
significantly. In this work, we address the problem in a
learning-inference-mixed fashion. By observing that the linear-subspace
assumption is more reliable on certain face patches rather than on the holistic
face, some Bayesian Patch Representations (BPRs) are randomly generated and
interpreted according to the Bayes' theory. We then train an ensemble model
over the patch-representations by minimizing the empirical risk w.r.t the
"leave-one-out margins". The obtained model is termed Optimal Representation
Ensemble (ORE), since it guarantees the optimality from the perspective of
Empirical Risk Minimization. To handle the unknown patterns in test faces, a
robust version of BPR is proposed by taking the non-face category into
consideration. Equipped with the Robust-BPRs, the inference ability of ORE is
increased dramatically and several record-breaking accuracies (99.9% on Yale-B
and 99.5% on AR) and desirable efficiencies (below 20 ms per face in Matlab)
are achieved. It also overwhelms other modular heuristics on the faces with
random occlusions, extreme expressions and disguises. Furthermore, to
accommodate immense BPRs sets, a boosting-like algorithm is also derived. The
boosted model, a.k.a Boosted-ORE, obtains similar performance to its prototype.
Besides the empirical superiorities, two desirable features of the proposed
methods, namely, the training-determined model-selection and the
data-weight-free boosting procedure, are also theoretically verified.Comment: 36-page draft for IEEE Transactions on Image Processing (TIP
ABC: A Big CAD Model Dataset For Geometric Deep Learning
We introduce ABC-Dataset, a collection of one million Computer-Aided Design
(CAD) models for research of geometric deep learning methods and applications.
Each model is a collection of explicitly parametrized curves and surfaces,
providing ground truth for differential quantities, patch segmentation,
geometric feature detection, and shape reconstruction. Sampling the parametric
descriptions of surfaces and curves allows generating data in different formats
and resolutions, enabling fair comparisons for a wide range of geometric
learning algorithms. As a use case for our dataset, we perform a large-scale
benchmark for estimation of surface normals, comparing existing data driven
methods and evaluating their performance against both the ground truth and
traditional normal estimation methods.Comment: 15 page
Making a Science of Model Search
Many computer vision algorithms depend on a variety of parameter choices and
settings that are typically hand-tuned in the course of evaluating the
algorithm. While such parameter tuning is often presented as being incidental
to the algorithm, correctly setting these parameter choices is frequently
critical to evaluating a method's full potential. Compounding matters, these
parameters often must be re-tuned when the algorithm is applied to a new
problem domain, and the tuning process itself often depends on personal
experience and intuition in ways that are hard to describe. Since the
performance of a given technique depends on both the fundamental quality of the
algorithm and the details of its tuning, it can be difficult to determine
whether a given technique is genuinely better, or simply better tuned.
In this work, we propose a meta-modeling approach to support automated hyper
parameter optimization, with the goal of providing practical tools to replace
hand-tuning with a reproducible and unbiased optimization process. Our approach
is to expose the underlying expression graph of how a performance metric (e.g.
classification accuracy on validation examples) is computed from parameters
that govern not only how individual processing steps are applied, but even
which processing steps are included. A hyper parameter optimization algorithm
transforms this graph into a program for optimizing that performance metric.
Our approach yields state of the art results on three disparate computer vision
problems: a face-matching verification task (LFW), a face identification task
(PubFig83) and an object recognition task (CIFAR-10), using a single algorithm.
More broadly, we argue that the formalization of a meta-model supports more
objective, reproducible, and quantitative evaluation of computer vision
algorithms, and that it can serve as a valuable tool for guiding algorithm
development
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
Photorealistic frontal view synthesis from a single face image has a wide
range of applications in the field of face recognition. Although data-driven
deep learning methods have been proposed to address this problem by seeking
solutions from ample face data, this problem is still challenging because it is
intrinsically ill-posed. This paper proposes a Two-Pathway Generative
Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by
simultaneously perceiving global structures and local details. Four landmark
located patch networks are proposed to attend to local textures in addition to
the commonly used global encoder-decoder network. Except for the novel
architecture, we make this ill-posed problem well constrained by introducing a
combination of adversarial loss, symmetry loss and identity preserving loss.
The combined loss function leverages both frontal face distribution and
pre-trained discriminative deep face models to guide an identity preserving
inference of frontal views from profiles. Different from previous deep learning
methods that mainly rely on intermediate features for recognition, our method
directly leverages the synthesized identity preserving image for downstream
tasks like face recognition and attribution estimation. Experimental results
demonstrate that our method not only presents compelling perceptual results but
also outperforms state-of-the-art results on large pose face recognition.Comment: accepted at ICCV 2017, main paper & supplementary material, 11 page
- …