141 research outputs found
Data-Driven Image Restoration
Every day many images are taken by digital cameras, and people
are demanding visually accurate and pleasing result. Noise and
blur degrade images captured by modern cameras, and high-level
vision tasks (such as segmentation, recognition, and tracking)
require high-quality images. Therefore, image restoration
specifically, image
deblurring and image denoising is a critical preprocessing step.
A fundamental problem in image deblurring is to recover reliably
distinct spatial frequencies that have been suppressed by the
blur kernel. Existing image deblurring techniques often rely on
generic image priors that only help recover part of the frequency
spectrum, such as the frequencies near the high-end. To this end,
we pose the following specific questions: (i) Does class-specific
information offer an advantage over existing generic priors for
image quality restoration? (ii) If a class-specific prior exists,
how should it be encoded into a deblurring framework to recover
attenuated image frequencies? Throughout this work, we devise a
class-specific prior based on the band-pass filter responses and
incorporate it into a deblurring strategy. Specifically, we show
that the subspace of band-pass filtered images and their
intensity distributions serve as useful priors for recovering
image frequencies.
Next, we present a novel image denoising algorithm that uses
external, category specific image database. In contrast to
existing noisy image restoration algorithms, our method selects
clean image “support patches” similar to the noisy patch from
an external database. We employ a content adaptive distribution
model for each patch where we derive the parameters of the
distribution from the support patches. Our objective function
composed of a Gaussian fidelity term that imposes category
specific information, and a low-rank term that encourages the
similarity between the noisy and the support patches in a robust
manner.
Finally, we propose to learn a fully-convolutional network model
that consists of a Chain of Identity Mapping Modules (CIMM) for
image denoising. The CIMM structure possesses two distinctive
features that are important for the noise removal task. Firstly,
each residual unit employs identity mappings as the skip
connections and receives pre-activated input to preserve the
gradient magnitude propagated in both the forward and backward
directions. Secondly, by utilizing dilated kernels for the
convolution layers in the residual branch, each neuron in the
last convolution layer of each module can observe the full
receptive field of the first layer
Denoising single images by feature ensemble revisited
Image denoising is still a challenging issue in many computer vision
sub-domains. Recent studies show that significant improvements are made
possible in a supervised setting. However, few challenges, such as spatial
fidelity and cartoon-like smoothing remain unresolved or decisively overlooked.
Our study proposes a simple yet efficient architecture for the denoising
problem that addresses the aforementioned issues. The proposed architecture
revisits the concept of modular concatenation instead of long and deeper
cascaded connections, to recover a cleaner approximation of the given image. We
find that different modules can capture versatile representations, and
concatenated representation creates a richer subspace for low-level image
restoration. The proposed architecture's number of parameters remains smaller
than the number for most of the previous networks and still achieves
significant improvements over the current state-of-the-art networks
Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution
Text-to-3D is an emerging task that allows users to create 3D content with
infinite possibilities. Existing works tackle the problem by optimizing a 3D
representation with guidance from pre-trained diffusion models. An apparent
drawback is that they need to optimize from scratch for each prompt, which is
computationally expensive and often yields poor visual fidelity. In this paper,
we propose DreamPortrait, which aims to generate text-guided 3D-aware portraits
in a single-forward pass for efficiency. To achieve this, we extend Score
Distillation Sampling from datapoint to distribution formulation, which injects
semantic prior into a 3D distribution. However, the direct extension will lead
to the mode collapse problem since the objective only pursues semantic
alignment. Hence, we propose to optimize a distribution with hierarchical
condition adapters and GAN loss regularization. For better 3D modeling, we
further design a 3D-aware gated cross-attention mechanism to explicitly let the
model perceive the correspondence between the text and the 3D-aware space.
These elaborated designs enable our model to generate portraits with robust
multi-view semantic consistency, eliminating the need for optimization-based
methods. Extensive experiments demonstrate our model's highly competitive
performance and significant speed boost against existing methods
State of the Art on Diffusion Models for Visual Computing
The field of visual computing is rapidly advancing due to the emergence of
generative artificial intelligence (AI), which unlocks unprecedented
capabilities for the generation, editing, and reconstruction of images, videos,
and 3D scenes. In these domains, diffusion models are the generative AI
architecture of choice. Within the last year alone, the literature on
diffusion-based tools and applications has seen exponential growth and relevant
papers are published across the computer graphics, computer vision, and AI
communities with new works appearing daily on arXiv. This rapid growth of the
field makes it difficult to keep up with all recent developments. The goal of
this state-of-the-art report (STAR) is to introduce the basic mathematical
concepts of diffusion models, implementation details and design choices of the
popular Stable Diffusion model, as well as overview important aspects of these
generative AI tools, including personalization, conditioning, inversion, among
others. Moreover, we give a comprehensive overview of the rapidly growing
literature on diffusion-based generation and editing, categorized by the type
of generated medium, including 2D images, videos, 3D objects, locomotion, and
4D scenes. Finally, we discuss available datasets, metrics, open challenges,
and social implications. This STAR provides an intuitive starting point to
explore this exciting topic for researchers, artists, and practitioners alike
- …