63 research outputs found
Compressed Online Dictionary Learning for Fast fMRI Decomposition
We present a method for fast resting-state fMRI spatial decomposi-tions of
very large datasets, based on the reduction of the temporal dimension before
applying dictionary learning on concatenated individual records from groups of
subjects. Introducing a measure of correspondence between spatial
decompositions of rest fMRI, we demonstrates that time-reduced dictionary
learning produces result as reliable as non-reduced decompositions. We also
show that this reduction significantly improves computational scalability
Learning brain regions via large-scale online structured sparse dictionary-learning
International audienceWe propose a multivariate online dictionary-learning method for obtaining de-compositions of brain images with structured and sparse components (aka atoms). Sparsity is to be understood in the usual sense: the dictionary atoms are constrained to contain mostly zeros. This is imposed via an 1-norm constraint. By "struc-tured", we mean that the atoms are piece-wise smooth and compact, thus making up blobs, as opposed to scattered patterns of activation. We propose to use a Sobolev (Laplacian) penalty to impose this type of structure. Combining the two penalties, we obtain decompositions that properly delineate brain structures from functional images. This non-trivially extends the online dictionary-learning work of Mairal et al. (2010), at the price of only a factor of 2 or 3 on the overall running time. Just like the Mairal et al. (2010) reference method, the online nature of our proposed algorithm allows it to scale to arbitrarily sized datasets. Experiments on brain data show that our proposed method extracts structured and denoised dictionaries that are more intepretable and better capture inter-subject variability in small medium, and large-scale regimes alike, compared to state-of-the-art models
Dissecting adaptive methods in GANs
Adaptive methods are a crucial component widely used for training generative
adversarial networks (GANs). While there has been some work to pinpoint the
"marginal value of adaptive methods" in standard tasks, it remains unclear why
they are still critical for GAN training. In this paper, we formally study how
adaptive methods help train GANs; inspired by the grafting method proposed in
arXiv:2002.11803 [cs.LG], we separate the magnitude and direction components of
the Adam updates, and graft them to the direction and magnitude of SGDA updates
respectively. By considering an update rule with the magnitude of the Adam
update and the normalized direction of SGD, we empirically show that the
adaptive magnitude of Adam is key for GAN training. This motivates us to have a
closer look at the class of normalized stochastic gradient descent ascent
(nSGDA) methods in the context of GAN training. We propose a synthetic
theoretical framework to compare the performance of nSGDA and SGDA for GAN
training with neural networks. We prove that in that setting, GANs trained with
nSGDA recover all the modes of the true distribution, whereas the same networks
trained with SGDA (and any learning rate configuration) suffer from mode
collapse. The critical insight in our analysis is that normalizing the
gradients forces the discriminator and generator to be updated at the same
pace. We also experimentally show that for several datasets, Adam's performance
can be recovered with nSGDA methods
Self-conditioned Embedding Diffusion for Text Generation
Can continuous diffusion models bring the same performance breakthrough on
natural language they did for image generation? To circumvent the discrete
nature of text data, we can simply project tokens in a continuous space of
embeddings, as is standard in language modeling. We propose Self-conditioned
Embedding Diffusion, a continuous diffusion mechanism that operates on token
embeddings and allows to learn flexible and scalable diffusion models for both
conditional and unconditional text generation. Through qualitative and
quantitative evaluation, we show that our text diffusion models generate
samples comparable with those produced by standard autoregressive language
models - while being in theory more efficient on accelerator hardware at
inference time. Our work paves the way for scaling up diffusion models for
text, similarly to autoregressive models, and for improving performance with
recent refinements to continuous diffusion.Comment: 15 page
- …