Search CORE

1,646 research outputs found

Constraining Variational Inference with Geometric Jensen-Shannon Divergence.

Author: Deasy Jacob
Lió Pietro
Simidjievski Nikola
Publication venue: NeurIPS
Publication date: 01/01/2020
Field of study

We examine the problem of controlling divergences for latent space regularisation in variational autoencoders. Specifically, when aiming to reconstruct example

x\in\mathbb{R}^{m}

via latent space

z\in\mathbb{R}^{n}

(

n\leq m

), while balancing this against the need for generalisable latent representations. We present a regularisation mechanism based on the skew-geometric Jensen-Shannon divergence

\left(\textrm{JS}^{\textrm{G}_{\alpha}}\right)

. We find a variation in

\textrm{JS}^{\textrm{G}_{\alpha}}

, motivated by limiting cases, which leads to an intuitive interpolation between forward and reverse KL in the space of both distributions and divergences. We motivate its potential benefits for VAEs through low-dimensional examples, before presenting quantitative and qualitative results. Our experiments demonstrate that skewing our variant of

\textrm{JS}^{\textrm{G}_{\alpha}}

, in the context of

\textrm{JS}^{\textrm{G}_{\alpha}}

-VAEs, leads to better reconstruction and generation when compared to several baseline VAEs. Our approach is entirely unsupervised and utilises only one hyperparameter which can be easily interpreted in latent space.Comment: Camera-ready version, accepted at NeurIPS 202

arXiv.org e-Print Archive

Apollo (Cambridge)

Exploring probabilistic models for semi-supervised learning

Author: Wang Jianfeng
Publication venue
Publication date: 10/04/2024
Field of study

Deep neural networks are increasingly harnessed for computer vision tasks, thanks to their robust performance. However, their training demands large-scale labeled datasets, which are labor-intensive to prepare. Semi-supervised learning (SSL) offers a solution by learning from a mix of labeled and unlabeled data. While most state-of-the-art SSL methods follow a deterministic approach, the exploration of their probabilistic counterparts remains limited. This research area is important because probabilistic models can provide uncertainty estimates critical for real-world applications. For instance, SSL-trained models may fall short of those trained with supervised learning due to potential pseudo-label errors in unlabeled data, and these models are more likely to make wrong predictions in practice. Especially in critical sectors like medical image analysis and autonomous driving, decision-makers must understand the model’s limitations and when incorrect predictions may occur, insights often provided by uncertainty estimates. Furthermore, uncertainty can also serve as a criterion for filtering out unreliable pseudo-labels when unlabeled samples are used for training, potentially improving deep model performance. This thesis furthers the exploration of probabilistic models for SSL. Drawing on the widely-used Bayesian approximation tool, Monte Carlo (MC) dropout, I propose a new probabilistic framework, the Generative Bayesian Deep Learning (GBDL) architecture, for semi-supervised medical image segmentation. This approach not only mitigates potential overfitting found in previous methods but also achieves superior results across four evaluation metrics. Unlike its empirically designed predecessors, GBDL is underpinned by a full Bayesian formulation, providing a theoretical probabilistic foundation. Acknowledging MC dropout’s limitations, I introduce NP-Match, a novel proba- bilistic approach for large-scale semi-supervised image classification. We evaluated NP-Match’s generalization capabilities through extensive experiments in different challenging settings such as standard, imbalanced, and multi-label semi-supervised image classification. According to the experimental results, NP-Match not only competes favorably with previous state-of-the-art methods but also estimates uncertainty more rapidly than MC-dropout-based models, thus enhancing both training and testing efficiency. Lastly, I propose NP-SemiSeg, a new probabilistic model for semi-supervised se- mantic segmentation. This flexible model can be integrated with various existing segmentation frameworks to make predictions and estimate uncertainty. Experiments indicate that NP-SemiSeg surpasses MC dropout in accuracy, uncertainty quantification, and speed

Oxford University Research Archive

Survey of Social Bias in Vision-Language Models

Author: Bang Yejin
Cahyawijaya Samuel
Dai Wenliang
Fung Pascale
Lee Nayeon
Lovenia Holy
Publication venue
Publication date: 24/09/2023
Field of study

In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as uneven resource allocation and unfair representation of specific social groups. Addressing these biases and ensuring fairness in artificial intelligence (AI) systems has become a critical concern in the ML community. The recent introduction of pre-trained vision-and-language (VL) models in the emerging multimodal field demands attention to the potential social biases present in these models as well. Although VL models are susceptible to social bias, there is a limited understanding compared to the extensive discussions on bias in NLP and CV. This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL. By examining these perspectives, the survey aims to offer valuable guidelines on how to approach and mitigate social bias in both unimodal and multimodal settings. The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models in various applications and research endeavors

arXiv.org e-Print Archive

Cerebellum and Ocular Motor Control

Author: Kheradmand Amir
Zee David S.
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2011
Field of study

An intact cerebellum is a prerequisite for optimal ocular motor performance. The cerebellum fine-tunes each of the subtypes of eye movements so they work together to bring and maintain images of objects of interest on the fovea. Here we review the major aspects of the contribution of the cerebellum to ocular motor control. The approach will be based on structural–functional correlation, combining the effects of lesions and the results from physiologic studies, with the emphasis on the cerebellar regions known to be most closely related to ocular motor function: (1) the flocculus/paraflocculus for high-frequency (brief) vestibular responses, sustained pursuit eye movements, and gaze holding, (2) the nodulus/ventral uvula for low-frequency (sustained) vestibular responses, and (3) the dorsal oculomotor vermis and its target in the posterior portion of the fastigial nucleus (the fastigial oculomotor region) for saccades and pursuit initiation

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Generative models for natural images

Author: Ahmed Faruk
Publication venue
Publication date: 01/08/2017
Field of study

Nous traitons de modèles génératifs construits avec des réseaux de neurones dans le contexte de la modélisation d’images. De nos jours, trois types de modèles sont particulièrement prédominants: les modèles à variables latentes, tel que l’auto-encodeur variationnel (VAE), les modèles autorégressifs, tel que le réseau de neurones récurrent pixel (PixelRNN), et les modèles génératifs antagonistes (GANs), qui sont des modèles à transformation de bruit entrainés à l’aide d’un adversaire. Cette thèse traite de chacun de ces modèles. Le premier chapitre couvre la base des modèles génératifs, ainsi que les réseaux de neurones pro- fonds, qui constituent la technologie principalement utilisée à l’heure actuelle pour l’implémentation de modèles statistiques puissants. Dans le deuxième chapitre, nous implémentons un auto-encodeur variationnel avec un décodeur auto-régressif. Cela permet de se libérer de l’hypothèse d’indépendance des dimensions de sortie du décodeur variationnel, en modélisant une distribution jointe traçable à la place, et de doter le modèle auto-régressif d’un code latent. De plus, notre implémentation a un coût computationnel significativement réduit, si on le compare à un modèle purement auto-régressif ayant les mêmes hypothèses de modélisation et la même performance. Nous décrivons l’espace latent de façon hiérarchique, et montrons de manière qualitative la décomposition sémantique des causes latente induites par ce design. Finalement, nous présentons des résultats obtenus avec des jeux de données standards et démontrant que la performance de notre implémentation est fortement compétitive. Dans le troisième chapitre, nous présentons une procédure d’entrainement améliorée pour une variante récente de modèles génératifs antagoniste. Le «Wasserstein GAN» minimise la distance, mesurée avec la métrique de Wasserstein, entre la distribution réelle et celle générée par le modèle, ce qui le rend plus facile à entrainer qu’un GAN avec un objectif minimax. Cependant, en fonction des paramètres, il présente toujours des cas d’échecs avec certain modes d’entrainement. Nous avons découvert que le coupable est le coupage des poids, et nous le remplaçons par une pénalité sur la norme des gradients. Ceci améliore et stabilise l’entrainement, et ce sur différents types du paramètres (incluant des modèles de langue sur des données discrètes), et permet de générer des échantillons de haute qualités sur CIFAR-10 et LSUN bedrooms. Finalement, dans le quatrième chapitre, nous considérons l’usage de modèles génératifs modernes comme modèles de normalité dans un cadre de détection hors-distribution «zero-shot». Nous avons évalué certains des modèles précédemment présentés dans la thèse, et avons trouvé que les VAEs sont les plus prometteurs, bien que leurs performances laissent encore un large place à l’amélioration. Cette partie de la thèse constitue un travail en cours. Nous concluons en répétant l’importance des modèles génératifs dans le développement de l’intelligence artificielle et mentionnons quelques défis futurs.We discuss modern generative modelling of natural images based on neural networks. Three varieties of such models are particularly predominant at the time of writing: latent variable models such as variational autoencoders (VAE), autoregressive models such as pixel recurrent neural networks (PixelRNN), and generative adversarial networks (GAN), which are noise-transformation models trained with an adversary. This thesis touches on all three kinds. The first chapter covers background on generative models, along with relevant discussions about deep neural networks, which are currently the dominant technology for implementing powerful statistical models. In the second chapter, we implement variational autoencoders with autoregressive decoders. This removes the strong assumption of output dimensions being conditionally independent in variational autoencoders, instead tractably modelling a joint distribution, while also endowing autoregressive models with a latent code. Additionally, this model has significantly reduced computational cost compared to that of a purely autoregressive model with similar modelling assumptions and performance. We express the latent space as a hierarchy, and qualitatively demonstrate the semantic decomposition of latent causes induced by this design. Finally, we present results on standard datasets that demonstrate strongly competitive performance. In the third chapter, we present an improved training procedure for a recent variant on generative adversarial networks. Wasserstein GANs minimize the Earth-Mover’s distance between the real and generated distributions and have been shown to be much easier to train than with the standard minimax objective of GANs. However, they still exhibit some failure modes in training for some settings. We identify weight clipping as a culprit and replace it with a penalty on the gradient norm. This improves training further, and we demonstrate stability on a wide variety of settings (including language models over discrete data), and samples of high quality on the CIFAR-10 and LSUN bedrooms datasets. Finally, in the fourth chapter, we present work in development, where we consider the use of modern generative models as normality models in a zero-shot out-of-distribution detection setting. We evaluate some of the models we have discussed previously in the thesis, and find that VAEs are the most promising, although their overall performance leaves a lot of room for improvement. We conclude by reiterating the significance of generative modelling in the development of artificial intelligence, and mention some of the challenges ahead

Dépôt Institutionnel Numérique