144 research outputs found
Example-based image colorization using locality consistent sparse representation
—Image colorization aims to produce a natural looking color image from a given grayscale image, which remains a challenging problem. In this paper, we propose a novel examplebased image colorization method exploiting a new locality consistent sparse representation. Given a single reference color image, our method automatically colorizes the target grayscale image by sparse pursuit. For efficiency and robustness, our method operates at the superpixel level. We extract low-level intensity features, mid-level texture features and high-level semantic features for each superpixel, which are then concatenated to form its descriptor. The collection of feature vectors for all the superpixels from the reference image composes the dictionary. We formulate colorization of target superpixels as a dictionary-based sparse reconstruction problem. Inspired by the observation that superpixels with similar spatial location and/or feature representation are likely to match spatially close regions from the reference image, we further introduce a locality promoting regularization term into the energy formulation which substantially improves the matching consistency and subsequent colorization results. Target superpixels are colorized based on the chrominance information from the dominant reference superpixels. Finally, to further improve coherence while preserving sharpness, we develop a new edge-preserving filter for chrominance channels with the guidance from the target grayscale image. To the best of our knowledge, this is the first work on sparse pursuit image colorization from single reference images. Experimental results demonstrate that our colorization method outperforms state-ofthe-art methods, both visually and quantitatively using a user stud
Rewriting a Deep Generative Model
A deep generative model such as a GAN learns to model a rich set of semantic
and physical rules about the target distribution, but up to now, it has been
obscure how such rules are encoded in the network, or how a rule could be
changed. In this paper, we introduce a new problem setting: manipulation of
specific rules encoded by a deep generative model. To address the problem, we
propose a formulation in which the desired rule is changed by manipulating a
layer of a deep network as a linear associative memory. We derive an algorithm
for modifying one entry of the associative memory, and we demonstrate that
several interesting structural rules can be located and modified within the
layers of state-of-the-art generative models. We present a user interface to
enable users to interactively change the rules of a generative model to achieve
desired effects, and we show several proof-of-concept applications. Finally,
results on multiple datasets demonstrate the advantage of our method against
standard fine-tuning methods and edit transfer algorithms.Comment: ECCV 2020 (oral). Code at https://github.com/davidbau/rewriting. For
videos and demos see https://rewriting.csail.mit.edu
IST Austria Thesis
Modern computer vision systems heavily rely on statistical machine learning models, which typically require large amounts of labeled data to be learned reliably. Moreover, very recently computer vision research widely adopted techniques for representation learning, which further increase the demand for labeled data. However, for many important practical problems there is relatively small amount of labeled data available, so it is problematic to leverage full potential of the representation learning methods. One way to overcome this obstacle is to invest substantial resources into producing large labelled datasets. Unfortunately, this can be prohibitively expensive in practice. In this thesis we focus on the alternative way of tackling the aforementioned issue. We concentrate on methods, which make use of weakly-labeled or even unlabeled data. Specifically, the first half of the thesis is dedicated to the semantic image segmentation task. We develop a technique, which achieves competitive segmentation performance and only requires annotations in a form of global image-level labels instead of dense segmentation masks. Subsequently, we present a new methodology, which further improves segmentation performance by leveraging tiny additional feedback from a human annotator. By using our methods practitioners can greatly reduce the amount of data annotation effort, which is required to learn modern image segmentation models. In the second half of the thesis we focus on methods for learning from unlabeled visual data. We study a family of autoregressive models for modeling structure of natural images and discuss potential applications of these models. Moreover, we conduct in-depth study of one of these applications, where we develop the state-of-the-art model for the probabilistic image colorization task
Example-based image colorization via automatic feature selection and fusion
Image colorization is an important and difficult problem in image processing with various
applications including image stylization and heritage restoration. Most existing
image colorization methods utilize feature matching between the reference color image
and the target grayscale image. The effectiveness of features is often significantly
affected by the characteristics of the local image region. Traditional methods usually
combine multiple features to improve the matching performance. However, the same
set of features is still applied to the whole images. In this paper, based on the observation
that local regions have different characteristics and hence different features may
work more effectively, we propose a novel image colorization method using automatic
feature selection with the results fused via a Markov Random Field (MRF) model for
improved consistency. More specifically, the proposed algorithm automatically classifies
image regions as either uniform or non-uniform, and selects a suitable feature
vector for each local patch of the target image to determine the colorization results.
For this purpose, a descriptor based on luminance deviation is used to estimate the
probability of each patch being uniform or non-uniform, and the same descriptor is
also used for calculating the label cost of the MRF model to determine which feature
vector should be selected for each patch. In addition, the similarity between the luminance
of the neighborhood is used as the smoothness cost for the MRF model which enhances the local consistency of the colorization results. Experimental results on a variety
of images show that our method outperforms several state-of-the-art algorithms,
both visually and quantitatively using standard measures and a user study
Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning
The success of deep learning in computer vision is rooted in the ability of
deep networks to scale up model complexity as demanded by challenging visual
tasks. As complexity is increased, so is the need for large amounts of labeled
data to train the model. This is associated with a costly human annotation
effort. To address this concern, with the long-term goal of leveraging the
abundance of cheap unlabeled data, we explore methods of unsupervised
"pre-training." In particular, we propose to use self-supervised automatic
image colorization.
We show that traditional methods for unsupervised learning, such as
layer-wise clustering or autoencoders, remain inferior to supervised
pre-training. In search for an alternative, we develop a fully automatic image
colorization method. Our method sets a new state-of-the-art in revitalizing old
black-and-white photography, without requiring human effort or expertise.
Additionally, it gives us a method for self-supervised representation learning.
In order for the model to appropriately re-color a grayscale object, it must
first be able to identify it. This ability, learned entirely self-supervised,
can be used to improve other visual tasks, such as classification and semantic
segmentation. As a future direction for self-supervision, we investigate if
multiple proxy tasks can be combined to improve generalization. This turns out
to be a challenging open problem. We hope that our contributions to this
endeavor will provide a foundation for future efforts in making
self-supervision compete with supervised pre-training.Comment: Ph.D. thesi
User-assisted intrinsic images
For many computational photography applications, the lighting and
materials in the scene are critical pieces of information. We seek
to obtain intrinsic images, which decompose a photo into the product
of an illumination component that represents lighting effects
and a reflectance component that is the color of the observed material.
This is an under-constrained problem and automatic methods
are challenged by complex natural images. We describe a new
approach that enables users to guide an optimization with simple
indications such as regions of constant reflectance or illumination.
Based on a simple assumption on local reflectance distributions, we
derive a new propagation energy that enables a closed form solution
using linear least-squares. We achieve fast performance by introducing
a novel downsampling that preserves local color distributions.
We demonstrate intrinsic image decomposition on a variety
of images and show applications.National Science Foundation (U.S.) (NSF CAREER award 0447561)Institut national de recherche en informatique et en automatique (France) (Associate Research Team “Flexible Rendering”)Microsoft Research (New Faculty Fellowship)Alfred P. Sloan Foundation (Research Fellowship)Quanta Computer, Inc. (MIT-Quanta T Party
Toward more scalable structured models
While deep learning has achieved huge success across different disciplines from computer vision and natural language processing to computational biology and physical sciences, training such models is known to require significant amounts of data. One possible reason is that the structural properties of the data and problem are not modeled explicitly. Effectively exploiting the structure can help build more efficient and performing models. The complexity of the structure requires models with enough representation capabilities. However, increased structured model complexity usually leads to increased inference complexity and trickier learning procedures. Also, making progress on real-world applications requires learning paradigms that circumvent the limitation of evaluating the partition function and scale to high-dimensional datasets.
In this dissertation, we develop more scalable structured models, i.e., models with inference procedures that can handle complex dependencies between variables efficiently, and learning algorithms that operate in high-dimensional spaces. First, we extend Gaussian conditional random fields, traditionally unimodal and only capturing pairwise variables interactions, to model multi-modal distributions with high-order dependencies between the output space variables, while enabling exact inference and incorporating external constraints at runtime. We show compelling results on the task of diverse gray-image colorization. Then, we introduce a reinforcement learning-based method for solving inference in models with general higher-order potentials, that are intractable with traditional techniques. We show promising results on semantic segmentation. Finally, we propose a new loss, max-sliced score matching (MSSM), for learning structured models at scale. We assess our model on an estimation of densities and scores for implicit distributions in Variational and Wasserstein auto-encoders
- …