18,417 research outputs found

    Image Inpainting using Block-wise Procedural Training with Annealed Adversarial Counterpart

    Full text link
    Recent advances in deep generative models have shown promising potential in image inpanting, which refers to the task of predicting missing pixel values of an incomplete image using the known context. However, existing methods can be slow or generate unsatisfying results with easily detectable flaws. In addition, there is often perceivable discontinuity near the holes and require further post-processing to blend the results. We present a new approach to address the difficulty of training a very deep generative model to synthesize high-quality photo-realistic inpainting. Our model uses conditional generative adversarial networks (conditional GANs) as the backbone, and we introduce a novel block-wise procedural training scheme to stabilize the training while we increase the network depth. We also propose a new strategy called adversarial loss annealing to reduce the artifacts. We further describe several losses specifically designed for inpainting and show their effectiveness. Extensive experiments and user-study show that our approach outperforms existing methods in several tasks such as inpainting, face completion and image harmonization. Finally, we show our framework can be easily used as a tool for interactive guided inpainting, demonstrating its practical value to solve common real-world challenges

    Graphical Representation for Heterogeneous Face Recognition

    Full text link
    Heterogeneous face recognition (HFR) refers to matching face images acquired from different sources (i.e., different sensors or different wavelengths) for identification. HFR plays an important role in both biometrics research and industry. In spite of promising progresses achieved in recent years, HFR is still a challenging problem due to the difficulty to represent two heterogeneous images in a homogeneous manner. Existing HFR methods either represent an image ignoring the spatial information, or rely on a transformation procedure which complicates the recognition task. Considering these problems, we propose a novel graphical representation based HFR method (G-HFR) in this paper. Markov networks are employed to represent heterogeneous image patches separately, which takes the spatial compatibility between neighboring image patches into consideration. A coupled representation similarity metric (CRSM) is designed to measure the similarity between obtained graphical representations. Extensive experiments conducted on multiple HFR scenarios (viewed sketch, forensic sketch, near infrared image, and thermal infrared image) show that the proposed method outperforms state-of-the-art methods.Comment: 13 pages, 10 figures, TPAMI 2016 accepte

    Class-specific Poisson denoising by patch-based importance sampling

    Full text link
    In this paper, we address the problem of recovering images degraded by Poisson noise, where the image is known to belong to a specific class. In the proposed method, a dataset of clean patches from images of the class of interest is clustered using multivariate Gaussian distributions. In order to recover the noisy image, each noisy patch is assigned to one of these distributions, and the corresponding minimum mean squared error (MMSE) estimate is obtained. We propose to use a self-normalized importance sampling approach, which is a method of the Monte-Carlo family, for the both determining the most likely distribution and approximating the MMSE estimate of the clean patch. Experimental results shows that our proposed method outperforms other methods for Poisson denoising at a low SNR regime

    Single Frame Image super Resolution using Learned Directionlets

    Full text link
    In this paper, a new directionally adaptive, learning based, single image super resolution method using multiple direction wavelet transform, called Directionlets is presented. This method uses directionlets to effectively capture directional features and to extract edge information along different directions of a set of available high resolution images .This information is used as the training set for super resolving a low resolution input image and the Directionlet coefficients at finer scales of its high-resolution image are learned locally from this training set and the inverse Directionlet transform recovers the super-resolved high resolution image. The simulation results showed that the proposed approach outperforms standard interpolation techniques like Cubic spline interpolation as well as standard Wavelet-based learning, both visually and in terms of the mean squared error (mse) values. This method gives good result with aliased images also.Comment: 14 pages,6 figure

    Real-time, long-term hand tracking with unsupervised initialization

    Get PDF
    This paper proposes a complete tracking system that is capable of long-term, real-time hand tracking with unsupervised initialization and error recovery. Initialization is steered by a three-stage hand detector, combining spatial and temporal information. Hand hypotheses are generated by a random forest detector in the first stage, whereas a simple linear classifier eliminates false positive detections. Resulting detections are tracked by particle filters that gather temporal statistics in order to make a final decision. The detector is scale and rotation invariant, and can detect hands in any pose in unconstrained environments. The resulting discriminative confidence map is combined with a generative particle filter based observation model to enable robust, long-term hand tracking in real-time. The proposed solution is evaluated using several challenging, publicly available datasets, and is shown to clearly outperform other state of the art object tracking methods

    TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes

    Full text link
    We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e.g., color texture maps). The key idea is to utilize a 4-rotational symmetric (4-RoSy) field to define a domain for convolution on a surface. Though 4-RoSy fields have several properties favorable for convolution on surfaces (low distortion, few singularities, consistent parameterization, etc.), orientations are ambiguous up to 4-fold rotation at any sample point. So, we introduce a new convolutional operator invariant to the 4-RoSy ambiguity and use it in a network to extract features from high-resolution signals on geodesic neighborhoods of a surface. In comparison to alternatives, such as PointNet based methods which lack a notion of orientation, the coherent structure given by these neighborhoods results in significantly stronger features. As an example application, we demonstrate the benefits of our architecture for 3D semantic segmentation of textured 3D meshes. The results show that our method outperforms all existing methods on the basis of mean IoU by a significant margin in both geometry-only (6.4%) and RGB+Geometry (6.9-8.2%) settings

    Face Recognition using Optimal Representation Ensemble

    Full text link
    Recently, the face recognizers based on linear representations have been shown to deliver state-of-the-art performance. In real-world applications, however, face images usually suffer from expressions, disguises and random occlusions. The problematic facial parts undermine the validity of the linear-subspace assumption and thus the recognition performance deteriorates significantly. In this work, we address the problem in a learning-inference-mixed fashion. By observing that the linear-subspace assumption is more reliable on certain face patches rather than on the holistic face, some Bayesian Patch Representations (BPRs) are randomly generated and interpreted according to the Bayes' theory. We then train an ensemble model over the patch-representations by minimizing the empirical risk w.r.t the "leave-one-out margins". The obtained model is termed Optimal Representation Ensemble (ORE), since it guarantees the optimality from the perspective of Empirical Risk Minimization. To handle the unknown patterns in test faces, a robust version of BPR is proposed by taking the non-face category into consideration. Equipped with the Robust-BPRs, the inference ability of ORE is increased dramatically and several record-breaking accuracies (99.9% on Yale-B and 99.5% on AR) and desirable efficiencies (below 20 ms per face in Matlab) are achieved. It also overwhelms other modular heuristics on the faces with random occlusions, extreme expressions and disguises. Furthermore, to accommodate immense BPRs sets, a boosting-like algorithm is also derived. The boosted model, a.k.a Boosted-ORE, obtains similar performance to its prototype. Besides the empirical superiorities, two desirable features of the proposed methods, namely, the training-determined model-selection and the data-weight-free boosting procedure, are also theoretically verified.Comment: 36-page draft for IEEE Transactions on Image Processing (TIP

    ABC: A Big CAD Model Dataset For Geometric Deep Learning

    Full text link
    We introduce ABC-Dataset, a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications. Each model is a collection of explicitly parametrized curves and surfaces, providing ground truth for differential quantities, patch segmentation, geometric feature detection, and shape reconstruction. Sampling the parametric descriptions of surfaces and curves allows generating data in different formats and resolutions, enabling fair comparisons for a wide range of geometric learning algorithms. As a use case for our dataset, we perform a large-scale benchmark for estimation of surface normals, comparing existing data driven methods and evaluating their performance against both the ground truth and traditional normal estimation methods.Comment: 15 page

    Making a Science of Model Search

    Full text link
    Many computer vision algorithms depend on a variety of parameter choices and settings that are typically hand-tuned in the course of evaluating the algorithm. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is frequently critical to evaluating a method's full potential. Compounding matters, these parameters often must be re-tuned when the algorithm is applied to a new problem domain, and the tuning process itself often depends on personal experience and intuition in ways that are hard to describe. Since the performance of a given technique depends on both the fundamental quality of the algorithm and the details of its tuning, it can be difficult to determine whether a given technique is genuinely better, or simply better tuned. In this work, we propose a meta-modeling approach to support automated hyper parameter optimization, with the goal of providing practical tools to replace hand-tuning with a reproducible and unbiased optimization process. Our approach is to expose the underlying expression graph of how a performance metric (e.g. classification accuracy on validation examples) is computed from parameters that govern not only how individual processing steps are applied, but even which processing steps are included. A hyper parameter optimization algorithm transforms this graph into a program for optimizing that performance metric. Our approach yields state of the art results on three disparate computer vision problems: a face-matching verification task (LFW), a face identification task (PubFig83) and an object recognition task (CIFAR-10), using a single algorithm. More broadly, we argue that the formalization of a meta-model supports more objective, reproducible, and quantitative evaluation of computer vision algorithms, and that it can serve as a valuable tool for guiding algorithm development

    Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis

    Full text link
    Photorealistic frontal view synthesis from a single face image has a wide range of applications in the field of face recognition. Although data-driven deep learning methods have been proposed to address this problem by seeking solutions from ample face data, this problem is still challenging because it is intrinsically ill-posed. This paper proposes a Two-Pathway Generative Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by simultaneously perceiving global structures and local details. Four landmark located patch networks are proposed to attend to local textures in addition to the commonly used global encoder-decoder network. Except for the novel architecture, we make this ill-posed problem well constrained by introducing a combination of adversarial loss, symmetry loss and identity preserving loss. The combined loss function leverages both frontal face distribution and pre-trained discriminative deep face models to guide an identity preserving inference of frontal views from profiles. Different from previous deep learning methods that mainly rely on intermediate features for recognition, our method directly leverages the synthesized identity preserving image for downstream tasks like face recognition and attribution estimation. Experimental results demonstrate that our method not only presents compelling perceptual results but also outperforms state-of-the-art results on large pose face recognition.Comment: accepted at ICCV 2017, main paper & supplementary material, 11 page
    corecore