1,646 research outputs found
Constraining Variational Inference with Geometric Jensen-Shannon Divergence.
We examine the problem of controlling divergences for latent space
regularisation in variational autoencoders. Specifically, when aiming to
reconstruct example via latent space
(), while balancing this against the need for generalisable latent
representations. We present a regularisation mechanism based on the
skew-geometric Jensen-Shannon divergence
. We find a variation in
, motivated by limiting cases, which leads
to an intuitive interpolation between forward and reverse KL in the space of
both distributions and divergences. We motivate its potential benefits for VAEs
through low-dimensional examples, before presenting quantitative and
qualitative results. Our experiments demonstrate that skewing our variant of
, in the context of
-VAEs, leads to better reconstruction and
generation when compared to several baseline VAEs. Our approach is entirely
unsupervised and utilises only one hyperparameter which can be easily
interpreted in latent space.Comment: Camera-ready version, accepted at NeurIPS 202
Exploring probabilistic models for semi-supervised learning
Deep neural networks are increasingly harnessed for computer vision tasks, thanks
to their robust performance. However, their training demands large-scale labeled
datasets, which are labor-intensive to prepare. Semi-supervised learning (SSL) offers a solution by learning from a mix of labeled and unlabeled data.
While most state-of-the-art SSL methods follow a deterministic approach, the
exploration of their probabilistic counterparts remains limited. This research area
is important because probabilistic models can provide uncertainty estimates critical
for real-world applications. For instance, SSL-trained models may fall short of those
trained with supervised learning due to potential pseudo-label errors in unlabeled data, and these models are more likely to make wrong predictions in practice. Especially in critical sectors like medical image analysis and autonomous driving, decision-makers must understand the modelās limitations and when incorrect predictions may occur, insights often provided by uncertainty estimates. Furthermore, uncertainty can also serve as a criterion for filtering out unreliable pseudo-labels when unlabeled samples are used for training, potentially improving deep model performance.
This thesis furthers the exploration of probabilistic models for SSL. Drawing on the widely-used Bayesian approximation tool, Monte Carlo (MC) dropout, I propose a new probabilistic framework, the Generative Bayesian Deep Learning (GBDL) architecture, for semi-supervised medical image segmentation. This approach not only mitigates potential overfitting found in previous methods but also achieves superior results across four evaluation metrics. Unlike its empirically designed predecessors, GBDL is underpinned by a full Bayesian formulation, providing a theoretical probabilistic foundation.
Acknowledging MC dropoutās limitations, I introduce NP-Match, a novel proba-
bilistic approach for large-scale semi-supervised image classification. We evaluated
NP-Matchās generalization capabilities through extensive experiments in different challenging settings such as standard, imbalanced, and multi-label semi-supervised image classification. According to the experimental results, NP-Match not only competes favorably with previous state-of-the-art methods but also estimates uncertainty more rapidly than MC-dropout-based models, thus enhancing both training and testing efficiency.
Lastly, I propose NP-SemiSeg, a new probabilistic model for semi-supervised se-
mantic segmentation. This flexible model can be integrated with various existing
segmentation frameworks to make predictions and estimate uncertainty. Experiments indicate that NP-SemiSeg surpasses MC dropout in accuracy, uncertainty quantification, and speed
Survey of Social Bias in Vision-Language Models
In recent years, the rapid advancement of machine learning (ML) models,
particularly transformer-based pre-trained models, has revolutionized Natural
Language Processing (NLP) and Computer Vision (CV) fields. However, researchers
have discovered that these models can inadvertently capture and reinforce
social biases present in their training datasets, leading to potential social
harms, such as uneven resource allocation and unfair representation of specific
social groups. Addressing these biases and ensuring fairness in artificial
intelligence (AI) systems has become a critical concern in the ML community.
The recent introduction of pre-trained vision-and-language (VL) models in the
emerging multimodal field demands attention to the potential social biases
present in these models as well. Although VL models are susceptible to social
bias, there is a limited understanding compared to the extensive discussions on
bias in NLP and CV. This survey aims to provide researchers with a high-level
insight into the similarities and differences of social bias studies in
pre-trained models across NLP, CV, and VL. By examining these perspectives, the
survey aims to offer valuable guidelines on how to approach and mitigate social
bias in both unimodal and multimodal settings. The findings and recommendations
presented here can benefit the ML community, fostering the development of
fairer and non-biased AI models in various applications and research endeavors
Cerebellum and Ocular Motor Control
An intact cerebellum is a prerequisite for optimal ocular motor performance. The cerebellum fine-tunes each of the subtypes of eye movements so they work together to bring and maintain images of objects of interest on the fovea. Here we review the major aspects of the contribution of the cerebellum to ocular motor control. The approach will be based on structuralāfunctional correlation, combining the effects of lesions and the results from physiologic studies, with the emphasis on the cerebellar regions known to be most closely related to ocular motor function: (1) the flocculus/paraflocculus for high-frequency (brief) vestibular responses, sustained pursuit eye movements, and gaze holding, (2) the nodulus/ventral uvula for low-frequency (sustained) vestibular responses, and (3) the dorsal oculomotor vermis and its target in the posterior portion of the fastigial nucleus (the fastigial oculomotor region) for saccades and pursuit initiation
Generative models for natural images
Nous traitons de modeĢles geĢneĢratifs construits avec des reĢseaux de neurones dans le contexte de la modeĢlisation dāimages. De nos jours, trois types de modeĢles sont particulieĢrement preĢdominants: les modeĢles aĢ variables latentes, tel que lāauto-encodeur variationnel (VAE), les modeĢles autoreĢgressifs, tel que le reĢseau de neurones reĢcurrent pixel (PixelRNN), et les modeĢles geĢneĢratifs antagonistes (GANs), qui sont des modeĢles aĢ transformation de bruit entraineĢs aĢ lāaide dāun adversaire. Cette theĢse traite de chacun de ces modeĢles.
Le premier chapitre couvre la base des modeĢles geĢneĢratifs, ainsi que les reĢseaux de neurones pro- fonds, qui constituent la technologie principalement utiliseĢe aĢ lāheure actuelle pour lāimpleĢmentation de modeĢles statistiques puissants.
Dans le deuxieĢme chapitre, nous impleĢmentons un auto-encodeur variationnel avec un deĢcodeur auto-reĢgressif. Cela permet de se libeĢrer de lāhypotheĢse dāindeĢpendance des dimensions de sortie du deĢcodeur variationnel, en modeĢlisant une distribution jointe tracĢ§able aĢ la place, et de doter le modeĢle auto-reĢgressif dāun code latent. De plus, notre impleĢmentation a un couĢt computationnel significativement reĢduit, si on le compare aĢ un modeĢle purement auto-reĢgressif ayant les meĢmes hypotheĢses de modeĢlisation et la meĢme performance. Nous deĢcrivons lāespace latent de facĢ§on hieĢrarchique, et montrons de manieĢre qualitative la deĢcomposition seĢmantique des causes latente induites par ce design. Finalement, nous preĢsentons des reĢsultats obtenus avec des jeux de donneĢes standards et deĢmontrant que la performance de notre impleĢmentation est fortement compeĢtitive.
Dans le troisieĢme chapitre, nous preĢsentons une proceĢdure dāentrainement ameĢlioreĢe pour une variante reĢcente de modeĢles geĢneĢratifs antagoniste. Le Ā«Wasserstein GANĀ» minimise la distance, mesureĢe avec la meĢtrique de Wasserstein, entre la distribution reĢelle et celle geĢneĢreĢe par le modeĢle, ce qui le rend plus facile aĢ entrainer quāun GAN avec un objectif minimax. Cependant, en fonction des parameĢtres, il preĢsente toujours des cas dāeĢchecs avec certain modes dāentrainement. Nous avons deĢcouvert que le coupable est le coupage des poids, et nous le remplacĢ§ons par une peĢnaliteĢ sur la norme des gradients. Ceci ameĢliore et stabilise lāentrainement, et ce sur diffeĢrents types du parameĢtres (incluant des modeĢles de langue sur des donneĢes discreĢtes), et permet de geĢneĢrer des eĢchantillons de haute qualiteĢs sur CIFAR-10 et LSUN bedrooms.
Finalement, dans le quatrieĢme chapitre, nous consideĢrons lāusage de modeĢles geĢneĢratifs modernes comme modeĢles de normaliteĢ dans un cadre de deĢtection hors-distribution Ā«zero-shotĀ». Nous avons eĢvalueĢ certains des modeĢles preĢceĢdemment preĢsenteĢs dans la theĢse, et avons trouveĢ que les VAEs sont les plus prometteurs, bien que leurs performances laissent encore un large place aĢ lāameĢlioration. Cette partie de la theĢse constitue un travail en cours.
Nous concluons en reĢpeĢtant lāimportance des modeĢles geĢneĢratifs dans le deĢveloppement de lāintelligence artificielle et mentionnons quelques deĢfis futurs.We discuss modern generative modelling of natural images based on neural networks. Three varieties of such models are particularly predominant at the time of writing: latent variable models such as variational autoencoders (VAE), autoregressive models such as pixel recurrent neural networks (PixelRNN), and generative adversarial networks (GAN), which are noise-transformation models trained with an adversary. This thesis touches on all three kinds.
The first chapter covers background on generative models, along with relevant discussions about deep neural networks, which are currently the dominant technology for implementing powerful statistical models.
In the second chapter, we implement variational autoencoders with autoregressive decoders. This removes the strong assumption of output dimensions being conditionally independent in variational autoencoders, instead tractably modelling a joint distribution, while also endowing autoregressive models with a latent code. Additionally, this model has significantly reduced computational cost compared to that of a purely autoregressive model with similar modelling assumptions and performance. We express the latent space as a hierarchy, and qualitatively demonstrate the semantic decomposition of latent causes induced by this design. Finally, we present results on standard datasets that demonstrate strongly competitive performance.
In the third chapter, we present an improved training procedure for a recent variant on generative adversarial networks. Wasserstein GANs minimize the Earth-Moverās distance between the real and generated distributions and have been shown to be much easier to train than with the standard minimax objective of GANs. However, they still exhibit some failure modes in training for some settings. We identify weight clipping as a culprit and replace it with a penalty on the gradient norm. This improves training further, and we demonstrate stability on a wide variety of settings (including language models over discrete data), and samples of high quality on the CIFAR-10 and LSUN bedrooms datasets.
Finally, in the fourth chapter, we present work in development, where we consider the use of modern generative models as normality models in a zero-shot out-of-distribution detection setting. We evaluate some of the models we have discussed previously in the thesis, and find that VAEs are the most promising, although their overall performance leaves a lot of room for improvement.
We conclude by reiterating the significance of generative modelling in the development of artificial intelligence, and mention some of the challenges ahead
- ā¦