23 research outputs found
Deep representation learning for photorealistic content creation
We study the problem of deep representation learning for photorealistic content creation. This is a critical component in many computer vision applications ranging from virtual reality, videography, and even retail and advertising. In this thesis, we use deep neural techniques to develop end-to-end models that are capable of generating photorealistic results. Our framework is applied in three applications.
First, we study real-time universal Photorealistic Image Style Transfer. Photorealistic style transfer is the task of transferring the artistic style of an image onto a content target, producing a result that is plausibly taken with a camera. We propose a new end-to-end model for photorealistic style transfer that is both fast and inherently generates photorealistic results. The core of our approach is a feed-forward neural network that learns local edge-aware affine transforms that automatically obey the photorealism constraint. Our method produces visually superior results and is three orders of magnitude faster, enabling real-time performance at 4K on a mobile phone.
Next, we learn real-time localized Photorealistic Video Style Transfer. We present a novel algorithm for transferring artistic styles of an image onto local regions of a target video while preserving its photorealism. Local regions may be selected either fully automatically from an image, through using video segmentation algorithms, or from casual user guidance such as scribbles. Our method is real-time and works on arbitrary inputs without runtime optimization once trained. We demonstrate our method on a variety of style images and target videos, including the ability to transfer different styles onto multiple objects simultaneously, and smoothly transition between styles in time.
Lastly, we tackle the problem of attribute-based Fashion Image Retrieval and Content Creation. We present an effective approach for generating new outfits based on the input queries through generative adversarial learning. We address this challenge by decomposing the complicated process into two stages. In the first stage, we present a novel attribute-aware global ranking network for attribute-based fashion retrieval. In the second stage, a generative model is used to finalize the retrieved results conditioned on an individual’s preferred style. We demonstrate promising results on standard large-scale benchmarks
Advances in 3D Neural Stylization: A Survey
Modern artificial intelligence provides a novel way of producing digital art
in styles. The expressive power of neural networks enables the realm of visual
style transfer methods, which can be used to edit images, videos, and 3D data
to make them more artistic and diverse. This paper reports on recent advances
in neural stylization for 3D data. We provide a taxonomy for neural stylization
by considering several important design choices, including scene
representation, guidance data, optimization strategies, and output styles.
Building on such taxonomy, our survey first revisits the background of neural
stylization on 2D images, and then provides in-depth discussions on recent
neural stylization methods for 3D data, where we also provide a mini-benchmark
on artistic stylization methods. Based on the insights gained from the survey,
we then discuss open challenges, future research, and potential applications
and impacts of neural stylization.Comment: 26 page
Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer
Recent object detection models for infrared (IR) imagery are based upon deep
neural networks (DNNs) and require large amounts of labeled training imagery.
However, publicly-available datasets that can be used for such training are
limited in their size and diversity. To address this problem, we explore
cross-modal style transfer (CMST) to leverage large and diverse color imagery
datasets so that they can be used to train DNN-based IR image based object
detectors. We evaluate six contemporary stylization methods on four
publicly-available IR datasets - the first comparison of its kind - and find
that CMST is highly effective for DNN-based detectors. Surprisingly, we find
that existing data-driven methods are outperformed by a simple grayscale
stylization (an average of the color channels). Our analysis reveals that
existing data-driven methods are either too simplistic or introduce significant
artifacts into the imagery. To overcome these limitations, we propose
meta-learning style transfer (MLST), which learns a stylization by composing
and tuning well-behaved analytic functions. We find that MLST leads to more
complex stylizations without introducing significant image artifacts and
achieves the best overall detector performance on our benchmark datasets
Object-based attention mechanism for color calibration of UAV remote sensing images in precision agriculture.
Color calibration is a critical step for unmanned aerial vehicle (UAV) remote sensing, especially in precision agriculture, which relies mainly on correlating color changes to specific quality attributes, e.g. plant health, disease, and pest stresses. In UAV remote sensing, the exemplar-based color transfer is popularly used for color calibration, where the automatic search for the semantic correspondences is the key to ensuring the color transfer accuracy. However, the existing attention mechanisms encounter difficulties in building the precise semantic correspondences between the reference image and the target one, in which the normalized cross correlation is often computed for feature reassembling. As a result, the color transfer accuracy is inevitably decreased by the disturbance from the semantically unrelated pixels, leading to semantic mismatch due to the absence of semantic correspondences. In this article, we proposed an unsupervised object-based attention mechanism (OBAM) to suppress the disturbance of the semantically unrelated pixels, along with a further introduced weight-adjusted Adaptive Instance Normalization (AdaIN) (WAA) method to tackle the challenges caused by the absence of semantic correspondences. By embedding the proposed modules into a photorealistic style transfer method with progressive stylization, the color transfer accuracy can be improved while better preserving the structural details. We evaluated our approach on the UAV data of different crop types including rice, beans, and cotton. Extensive experiments demonstrate that our proposed method outperforms several state-of-the-art methods. As our approach requires no annotated labels, it can be easily embedded into the off-the-shelf color transfer approaches. Relevant codes and configurations will be available at https://github.com/huanghsheng/object-based-attention-mechanis
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly
While text-3D editing has made significant strides in leveraging score
distillation sampling, emerging approaches still fall short in delivering
separable, precise and consistent outcomes that are vital to content creation.
In response, we introduce FocalDreamer, a framework that merges base shape with
editable parts according to text prompts for fine-grained editing within
desired regions. Specifically, equipped with geometry union and dual-path
rendering, FocalDreamer assembles independent 3D parts into a complete object,
tailored for convenient instance reuse and part-wise control. We propose
geometric focal loss and style consistency regularization, which encourage
focal fusion and congruent overall appearance. Furthermore, FocalDreamer
generates high-fidelity geometry and PBR textures which are compatible with
widely-used graphics engines. Extensive experiments have highlighted the
superior editing capabilities of FocalDreamer in both quantitative and
qualitative evaluations.Comment: Project website: https://focaldreamer.github.i
Deep Reinforcement Learning for Image-to-Image Translation
Most existing Image-to-Image Translation (I2IT) methods generate images in a
single run of a deep learning (DL) model. However, designing such a single-step
model is always challenging, requiring a huge number of parameters and easily
falling into bad global minimums and overfitting. In this work, we reformulate
I2IT as a step-wise decision-making problem via deep reinforcement learning
(DRL) and propose a novel framework that performs RL-based I2IT (RL-I2IT). The
key feature in the RL-I2IT framework is to decompose a monolithic learning
process into small steps with a lightweight model to progressively transform a
source image successively to a target image. Considering that it is challenging
to handle high dimensional continuous state and action spaces in the
conventional RL framework, we introduce meta policy with a new concept Plan to
the standard Actor-Critic model, which is of a lower dimension than the
original image and can facilitate the actor to generate a tractable high
dimensional action. In the RL-I2IT framework, we also employ a task-specific
auxiliary learning strategy to stabilize the training process and improve the
performance of the corresponding task. Experiments on several I2IT tasks
demonstrate the effectiveness and robustness of the proposed method when facing
high-dimensional continuous action space problems
Image Synthesis under Limited Data: A Survey and Taxonomy
Deep generative models, which target reproducing the given data distribution
to produce novel samples, have made unprecedented advancements in recent years.
Their technical breakthroughs have enabled unparalleled quality in the
synthesis of visual content. However, one critical prerequisite for their
tremendous success is the availability of a sufficient number of training
samples, which requires massive computation resources. When trained on limited
data, generative models tend to suffer from severe performance deterioration
due to overfitting and memorization. Accordingly, researchers have devoted
considerable attention to develop novel models that are capable of generating
plausible and diverse images from limited training data recently. Despite
numerous efforts to enhance training stability and synthesis quality in the
limited data scenarios, there is a lack of a systematic survey that provides 1)
a clear problem definition, critical challenges, and taxonomy of various tasks;
2) an in-depth analysis on the pros, cons, and remain limitations of existing
literature; as well as 3) a thorough discussion on the potential applications
and future directions in the field of image synthesis under limited data. In
order to fill this gap and provide a informative introduction to researchers
who are new to this topic, this survey offers a comprehensive review and a
novel taxonomy on the development of image synthesis under limited data. In
particular, it covers the problem definition, requirements, main solutions,
popular benchmarks, and remain challenges in a comprehensive and all-around
manner.Comment: 230 references, 25 pages. GitHub:
https://github.com/kobeshegu/awesome-few-shot-generatio