368 research outputs found
Compressed Sensing MRI Reconstruction Regularized by VAEs with Structured Image Covariance
Objective: This paper investigates how generative models, trained on
ground-truth images, can be used \changes{as} priors for inverse problems,
penalizing reconstructions far from images the generator can produce. The aim
is that learned regularization will provide complex data-driven priors to
inverse problems while still retaining the control and insight of a variational
regularization method. Moreover, unsupervised learning, without paired training
data, allows the learned regularizer to remain flexible to changes in the
forward problem such as noise level, sampling pattern or coil sensitivities in
MRI.
Approach: We utilize variational autoencoders (VAEs) that generate not only
an image but also a covariance uncertainty matrix for each image. The
covariance can model changing uncertainty dependencies caused by structure in
the image, such as edges or objects, and provides a new distance metric from
the manifold of learned images.
Main results: We evaluate these novel generative regularizers on
retrospectively sub-sampled real-valued MRI measurements from the fastMRI
dataset. We compare our proposed learned regularization against other unlearned
regularization approaches and unsupervised and supervised deep learning
methods.
Significance: Our results show that the proposed method is competitive with
other state-of-the-art methods and behaves consistently with changing sampling
patterns and noise levels
Deep Generative Variational Autoencoding for Replay Spoof Detection in Automatic Speaker Verification
Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount - yet difficult to detect reliably. The generalization failure of spoofing countermeasures (CMs) has driven the community to study various alternative deep learning CMs. The majority of them are supervised approaches that learn a human-spoof discriminator. In this paper, we advocate a different, deep generative approach that leverages from powerful unsupervised manifold learning in classification. The potential benefits include the possibility to sample new data, and to obtain insights to the latent features of genuine and spoofed speech. To this end, we propose to use variational autoencoders (VAEs) as an alternative backend for replay attack detection, via three alternative models that differ in their class-conditioning. The first one, similar to the use of Gaussian mixture models (GMMs) in spoof detection, is to train independently two VAEs - one for each class. The second one is to train a single conditional model (C-VAE) by injecting a one-hot class label vector to the encoder and decoder networks. Our final proposal integrates an auxiliary classifier to guide the learning of the latent space. Our experimental results using constant-Q cepstral coefficient (CQCC) features on the ASVspoof 2017 and 2019 physical access subtask datasets indicate that the C-VAE offers substantial improvement in comparison to training two separate VAEs for each class. On the 2019 dataset, the C-VAE outperforms the VAE and the baseline GMM by an absolute 9-10% in both equal error rate (EER) and tandem detection cost function (t-DCF) metrics. Finally, we propose VAE residuals --- the absolute difference of the original input and the reconstruction as features for spoofing detection. The proposed frontend approach augmented with a convolutional neural network classifier demonstrated substantial improvement over the VAE backend use case
Recommended from our members
Compressing images by encoding their latent representations with relative entropy coding
Variational Autoencoders (VAEs) have seen widespread use in learned image
compression. They are used to learn expressive latent representations on which
downstream compression methods can operate with high efficiency. Recently
proposed 'bits-back' methods can indirectly encode the latent representation of
images with codelength close to the relative entropy between the latent
posterior and the prior. However, due to the underlying algorithm, these
methods can only be used for lossless compression, and they only achieve their
nominal efficiency when compressing multiple images simultaneously; they are
inefficient for compressing single images. As an alternative, we propose a
novel method, Relative Entropy Coding (REC), that can directly encode the
latent representation with codelength close to the relative entropy for single
images, supported by our empirical results obtained on the Cifar10, ImageNet32
and Kodak datasets. Moreover, unlike previous bits-back methods, REC is
immediately applicable to lossy compression, where it is competitive with the
state-of-the-art on the Kodak dataset
Modeling neural dynamics during speech production using a state space variational autoencoder
Characterizing the neural encoding of behavior remains a challenging task in
many research areas due in part to complex and noisy spatiotemporal dynamics of
evoked brain activity. An important aspect of modeling these neural encodings
involves separation of robust, behaviorally relevant signals from background
activity, which often contains signals from irrelevant brain processes and
decaying information from previous behavioral events. To achieve this
separation, we develop a two-branch State Space Variational AutoEncoder (SSVAE)
model to individually describe the instantaneous evoked foreground signals and
the context-dependent background signals. We modeled the spontaneous
speech-evoked brain dynamics using smoothed Gaussian mixture models. By
applying the proposed SSVAE model to track ECoG dynamics in one participant
over multiple hours, we find that the model can predict speech-related dynamics
more accurately than other latent factor inference algorithms. Our results
demonstrate that separately modeling the instantaneous speech-evoked and slow
context-dependent brain dynamics can enhance tracking performance, which has
important implications for the development of advanced neural encoding and
decoding models in various neuroscience sub-disciplines.Comment: 5 page
- …