59 research outputs found
Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Blind face restoration is a highly ill-posed problem that often requires
auxiliary guidance to 1) improve the mapping from degraded inputs to desired
outputs, or 2) complement high-quality details lost in the inputs. In this
paper, we demonstrate that a learned discrete codebook prior in a small proxy
space largely reduces the uncertainty and ambiguity of restoration mapping by
casting blind face restoration as a code prediction task, while providing rich
visual atoms for generating high-quality faces. Under this paradigm, we propose
a Transformer-based prediction network, named CodeFormer, to model the global
composition and context of the low-quality faces for code prediction, enabling
the discovery of natural faces that closely approximate the target faces even
when the inputs are severely degraded. To enhance the adaptiveness for
different degradation, we also propose a controllable feature transformation
module that allows a flexible trade-off between fidelity and quality. Thanks to
the expressive codebook prior and global modeling, CodeFormer outperforms the
state of the arts in both quality and fidelity, showing superior robustness to
degradation. Extensive experimental results on synthetic and real-world
datasets verify the effectiveness of our method.Comment: Accepted by NeurIPS 2022. Code: https://github.com/sczhou/CodeForme
Dual Associated Encoder for Face Restoration
Restoring facial details from low-quality (LQ) images has remained a
challenging problem due to its ill-posedness induced by various degradations in
the wild. The existing codebook prior mitigates the ill-posedness by leveraging
an autoencoder and learned codebook of high-quality (HQ) features, achieving
remarkable quality. However, existing approaches in this paradigm frequently
depend on a single encoder pre-trained on HQ data for restoring HQ images,
disregarding the domain gap between LQ and HQ images. As a result, the encoding
of LQ inputs may be insufficient, resulting in suboptimal performance. To
tackle this problem, we propose a novel dual-branch framework named DAEFR. Our
method introduces an auxiliary LQ branch that extracts crucial information from
the LQ inputs. Additionally, we incorporate association training to promote
effective synergy between the two branches, enhancing code prediction and
output quality. We evaluate the effectiveness of DAEFR on both synthetic and
real-world datasets, demonstrating its superior performance in restoring facial
details.Comment: Technical Repor
TrustMark: Universal Watermarking for Arbitrary Resolution Images
Imperceptible digital watermarking is important in copyright protection,
misinformation prevention, and responsible generative AI. We propose TrustMark
- a GAN-based watermarking method with novel design in architecture and
spatio-spectra losses to balance the trade-off between watermarked image
quality with the watermark recovery accuracy. Our model is trained with
robustness in mind, withstanding various in- and out-place perturbations on the
encoded image. Additionally, we introduce TrustMark-RM - a watermark remover
method useful for re-watermarking. Our methods achieve state-of-art performance
on 3 benchmarks comprising arbitrary resolution images
Unifying Image Processing as Visual Prompting Question Answering
Image processing is a fundamental task in computer vision, which aims at
enhancing image quality and extracting essential features for subsequent vision
applications. Traditionally, task-specific models are developed for individual
tasks and designing such models requires distinct expertise. Building upon the
success of large language models (LLMs) in natural language processing (NLP),
there is a similar trend in computer vision, which focuses on developing
large-scale models through pretraining and in-context learning. This paradigm
shift reduces the reliance on task-specific models, yielding a powerful unified
model to deal with various tasks. However, these advances have predominantly
concentrated on high-level vision tasks, with less attention paid to low-level
vision tasks. To address this issue, we propose a universal model for general
image processing that covers image restoration, image enhancement, image
feature extraction tasks, etc. Our proposed framework, named PromptGIP, unifies
these diverse image processing tasks within a universal framework. Inspired by
NLP question answering (QA) techniques, we employ a visual prompting question
answering paradigm. Specifically, we treat the input-output image pair as a
structured question-answer sentence, thereby reprogramming the image processing
task as a prompting QA problem. PromptGIP can undertake diverse cross-domain
tasks using provided visual prompts, eliminating the need for task-specific
finetuning. Our methodology offers a universal and adaptive solution to general
image processing. While PromptGIP has demonstrated a certain degree of
out-of-domain task generalization capability, further research is expected to
fully explore its more powerful emergent generalization.Comment: 16 pages, 12 figure
Bayesian plug & play methods for inverse problems in imaging.
Thèse de Doctorat de Mathématiques Appliquées (Université de Paris)Tesis de Doctorado en IngenierÃa Eléctrica (Universidad de la República)This thesis deals with Bayesian methods for solving ill-posed inverse problems in imaging with learnt image priors. The first part of this thesis (Chapter 3) concentrates on two particular problems, namely joint denoising and decompression and multi-image super-resolution. After an extensive study of the noise statistics for these problem in the transformed (wavelet or Fourier) domain, we derive two novel algorithms to solve this particular inverse problem. One of them is based on a multi-scale self-similarity prior and can be seen as a transform-domain generalization of the celebrated non-local bayes algorithm to the case of non-Gaussian noise. The second one uses a neural-network denoiser to implicitly encode the image prior, and a splitting scheme to incorporate this prior into an optimization algorithm to find a MAP-like estimator. The second part of this thesis concentrates on the Variational AutoEncoder (VAE) model and some of its variants that show its capabilities to explicitly capture the probability distribution of high-dimensional datasets such as images. Based on these VAE models, we propose two ways to incorporate them as priors for general inverse problems in imaging : • The first one (Chapter 4) computes a joint (space-latent) MAP estimator named Joint Posterior Maximization using an Autoencoding Prior (JPMAP). We show theoretical and experimental evidence that the proposed objective function satisfies a weak bi-convexity property which is sufficient to guarantee that our optimization scheme converges to a stationary point. Experimental results also show the higher quality of the solutions obtained by our JPMAP approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima. • The second one (Chapter 5) develops a Gibbs-like posterior sampling algorithm for the exploration of posterior distributions of inverse problems using multiple chains and a VAE as image prior. We showhowto use those samples to obtain MMSE estimates and their corresponding uncertainty.Cette thèse traite des méthodes bayésiennes pour résoudre des problèmes inverses mal posés en imagerie avec des distributions a priori d’images apprises. La première partie de cette thèse (Chapitre 3) se concentre sur deux problèmes partic-uliers, à savoir le débruitage et la décompression conjoints et la super-résolutionmulti-images. Après une étude approfondie des statistiques de bruit pour ces problèmes dans le domaine transformé (ondelettes ou Fourier), nous dérivons deuxnouveaux algorithmes pour résoudre ce problème inverse particulie. L’un d’euxest basé sur une distributions a priori d’auto-similarité multi-échelle et peut êtrevu comme une généralisation du célèbre algorithme de Non-Local Bayes au cas dubruit non gaussien. Le second utilise un débruiteur de réseau de neurones pourcoder implicitement la distribution a priori, et un schéma de division pour incor-porer cet distribution dans un algorithme d’optimisation pour trouver un estima-teur de type MAP. La deuxième partie de cette thèse se concentre sur le modèle Variational Auto Encoder (VAE) et certaines de ses variantes qui montrent ses capacités à capturer explicitement la distribution de probabilité d’ensembles de données de grande dimension tels que les images. Sur la base de ces modèles VAE, nous proposons deuxmanières de les incorporer comme distribution a priori pour les problèmes inverses généraux en imagerie: •Le premier (Chapitre 4) calcule un estimateur MAP conjoint (espace-latent) nommé Joint Posterior Maximization using an Autoencoding Prior (JPMAP). Nous montrons des preuves théoriques et expérimentales que la fonction objectif proposée satisfait une propriété de bi-convexité faible qui est suffisante pour garantir que notre schéma d’optimisation converge vers un pointstationnaire. Les résultats expérimentaux montrent également la meilleurequalité des solutions obtenues par notre approche JPMAP par rapport à d’autresapproches MAP non convexes qui restent le plus souvent bloquées dans desminima locaux. •Le second (Chapitre 5) développe un algorithme d’échantillonnage a poste-riori de type Gibbs pour l’exploration des distributions a posteriori de problèmes inverses utilisant des chaînes multiples et un VAE comme distribution a priori. Nous montrons comment utiliser ces échantillons pour obtenir desestimations MMSE et leur incertitude correspondante.En esta tesis se estudian métodos bayesianos para resolver problemas inversos mal condicionados en imágenes usando distribuciones a priori entrenadas. La primera parte de esta tesis (CapÃtulo 3) se concentra en dos problemas particulares, a saber, el de eliminación de ruido y descompresión conjuntos, y el de superresolución a partir de múltiples imágenes. Después de un extenso estudio de las estadÃsticas del ruido para estos problemas en el dominio transformado (wavelet o Fourier),derivamos dos algoritmos nuevos para resolver este problema inverso en particular. Uno de ellos se basa en una distribución a priori de autosimilitud multiescala y puede verse como una generalización al dominio wavelet del célebre algoritmo Non-Local Bayes para el caso de ruido no Gaussiano. El segundo utiliza un algoritmo de eliminación de ruido basado en una red neuronal para codificar implÃcitamente la distribución a priori de las imágenes y un esquema de relajación para incorporar esta distribución en un algoritmo de optimización y asà encontrar un estimador similar al MAP. La segunda parte de esta tesis se concentra en el modelo Variational AutoEncoder (VAE) y algunas de sus variantes que han mostrado capacidad para capturar explÃcitamente la distribución de probabilidad de conjuntos de datos en alta dimensión como las imágenes. Basándonos en estos modelos VAE, proponemos dos formas de incorporarlos como distribución a priori para problemas inversos genéricos en imágenes : •El primero (CapÃtulo 4) calcula un estimador MAP conjunto (espacio imagen y latente) llamado Joint Posterior Maximization using an Autoencoding Prior (JPMAP). Mostramos evidencia teórica y experimental de que la función objetivo propuesta satisface una propiedad de biconvexidad débil que es suficiente para garantizar que nuestro esquema de optimización converge a un punto estacionario. Los resultados experimentales también muestran la mayor calidad de las soluciones obtenidas por nuestro enfoque JPMAP con respecto a otros enfoques MAP no convexos que a menudo se atascan en mÃnimos locales espurios. •El segundo (CapÃtulo 5) desarrolla un algoritmo de muestreo tipo Gibbs parala exploración de la distribución a posteriori de problemas inversos utilizando múltiples cadenas y un VAE como distribución a priori. Mostramos cómo usar esas muestras para obtener estimaciones de MMSE y su correspondiente incertidumbr
- …