210 research outputs found
On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization
The emergence of various notions of ``consistency'' in diffusion models has
garnered considerable attention and helped achieve improved sample quality,
likelihood estimation, and accelerated sampling. Although similar concepts have
been proposed in the literature, the precise relationships among them remain
unclear. In this study, we establish theoretical connections between three
recent ``consistency'' notions designed to enhance diffusion models for
distinct objectives. Our insights offer the potential for a more comprehensive
and encompassing framework for consistency-type models
Unsupervised vocal dereverberation with diffusion-based generative models
Removing reverb from reverberant music is a necessary technique to clean up
audio for downstream music manipulations. Reverberation of music contains two
categories, natural reverb, and artificial reverb. Artificial reverb has a
wider diversity than natural reverb due to its various parameter setups and
reverberation types. However, recent supervised dereverberation methods may
fail because they rely on sufficiently diverse and numerous pairs of
reverberant observations and retrieved data for training in order to be
generalizable to unseen observations during inference. To resolve these
problems, we propose an unsupervised method that can remove a general kind of
artificial reverb for music without requiring pairs of data for training. The
proposed method is based on diffusion models, where it initializes the unknown
reverberation operator with a conventional signal processing technique and
simultaneously refines the estimate with the help of diffusion models. We show
through objective and perceptual evaluations that our method outperforms the
current leading vocal dereverberation benchmarks.Comment: 6 pages, 2 figures, submitted to ICASSP 202
GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
Pre-trained diffusion models have been successfully used as priors in a
variety of linear inverse problems, where the goal is to reconstruct a signal
from noisy linear measurements. However, existing approaches require knowledge
of the linear operator. In this paper, we propose GibbsDDRM, an extension of
Denoising Diffusion Restoration Models (DDRM) to a blind setting in which the
linear measurement operator is unknown. GibbsDDRM constructs a joint
distribution of the data, measurements, and linear operator by using a
pre-trained diffusion model for the data prior, and it solves the problem by
posterior sampling with an efficient variant of a Gibbs sampler. The proposed
method is problem-agnostic, meaning that a pre-trained diffusion model can be
applied to various inverse problems without fine-tuning. In experiments, it
achieved high performance on both blind image deblurring and vocal
dereverberation tasks, despite the use of simple generic priors for the
underlying linear operators
SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer
Generative adversarial networks (GANs) learn a target probability
distribution by optimizing a generator and a discriminator with minimax
objectives. This paper addresses the question of whether such optimization
actually provides the generator with gradients that make its distribution close
to the target distribution. We derive metrizable conditions, sufficient
conditions for the discriminator to serve as the distance between the
distributions by connecting the GAN formulation with the concept of sliced
optimal transport. Furthermore, by leveraging these theoretical results, we
propose a novel GAN training scheme, called slicing adversarial network (SAN).
With only simple modifications, a broad class of existing GANs can be converted
to SANs. Experiments on synthetic and image datasets support our theoretical
results and the SAN's effectiveness as compared to usual GANs. Furthermore, we
also apply SAN to StyleGAN-XL, which leads to state-of-the-art FID score
amongst GANs for class conditional generation on ImageNet 256256.Comment: 24 pages with 12 figure
VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Restoring degraded music signals is essential to enhance audio quality for
downstream music manipulation. Recent diffusion-based music restoration methods
have demonstrated impressive performance, and among them, diffusion posterior
sampling (DPS) stands out given its intrinsic properties, making it versatile
across various restoration tasks. In this paper, we identify that there are
potential issues which will degrade current DPS-based methods' performance and
introduce the way to mitigate the issues inspired by diverse diffusion guidance
techniques including the RePaint (RP) strategy and the Pseudoinverse-Guided
Diffusion Models (GDM). We demonstrate our methods for the vocal
declipping and bandwidth extension tasks under various levels of distortion and
cutoff frequency, respectively. In both tasks, our methods outperform the
current DPS-based music restoration benchmarks. We refer to
\url{http://carlosholivan.github.io/demos/audio-restoration-2023.html} for
examples of the restored audio samples
Combination of G72 Genetic Variation and G72 Protein Level to Detect Schizophrenia: Machine Learning Approaches
The D-amino acid oxidase activator (DAOA, also known as G72) gene is a strong schizophrenia susceptibility gene. Higher G72 protein levels have been implicated in patients with schizophrenia. The current study aimed to differentiate patients with schizophrenia from healthy individuals using G72 single nucleotide polymorphisms (SNPs) and G72 protein levels by leveraging computational artificial intelligence and machine learning tools. A total of 149 subjects with 89 patients with schizophrenia and 60 healthy controls were recruited. Two G72 genotypes (including rs1421292 and rs2391191) and G72 protein levels were measured with the peripheral blood. We utilized three machine learning algorithms (including logistic regression, naive Bayes, and C4.5 decision tree) to build the optimal predictive model for distinguishing schizophrenia patients from healthy controls. The naive Bayes model using two factors, including G72 rs1421292 and G72 protein, appeared to be the best model for disease susceptibility (sensitivity = 0.7969, specificity = 0.9372, area under the receiver operating characteristic curve (AUC) = 0.9356). However, a model integrating G72 rs1421292 only slightly increased the discriminative power than a model with G72 protein alone (sensitivity = 0.7941, specificity = 0.9503, AUC = 0.9324). Among the three models with G72 protein alone, the naive Bayes with G72 protein alone had the best specificity (0.9503), while logistic regression with G72 protein alone was the most sensitive (0.8765). The findings remained similar after adjusting for age and gender. This study suggests that G72 protein alone, without incorporating the two G72 SNPs, may have been suitable enough to identify schizophrenia patients. We also recommend applying both naive Bayes and logistic regression models for the best specificity and sensitivity, respectively. Larger-scale studies are warranted to confirm the findings
Manifold Preserving Guided Diffusion
Despite the recent advancements, conditional image generation still faces
challenges of cost, generalizability, and the need for task-specific training.
In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a
training-free conditional generation framework that leverages pretrained
diffusion models and off-the-shelf neural networks with minimal additional
inference cost for a broad range of tasks. Specifically, we leverage the
manifold hypothesis to refine the guided diffusion steps and introduce a
shortcut algorithm in the process. We then propose two methods for on-manifold
training-free guidance using pre-trained autoencoders and demonstrate that our
shortcut inherently preserves the manifolds when applied to latent diffusion
models. Our experiments show that MPGD is efficient and effective for solving a
variety of conditional generation applications in low-compute settings, and can
consistently offer up to 3.8x speed-ups with the same number of diffusion steps
while maintaining high sample quality compared to the baselines
- …