1,646 research outputs found

    Constraining Variational Inference with Geometric Jensen-Shannon Divergence.

    Get PDF
    We examine the problem of controlling divergences for latent space regularisation in variational autoencoders. Specifically, when aiming to reconstruct example xāˆˆRmx\in\mathbb{R}^{m} via latent space zāˆˆRnz\in\mathbb{R}^{n} (nā‰¤mn\leq m), while balancing this against the need for generalisable latent representations. We present a regularisation mechanism based on the skew-geometric Jensen-Shannon divergence (JSGĪ±)\left(\textrm{JS}^{\textrm{G}_{\alpha}}\right). We find a variation in JSGĪ±\textrm{JS}^{\textrm{G}_{\alpha}}, motivated by limiting cases, which leads to an intuitive interpolation between forward and reverse KL in the space of both distributions and divergences. We motivate its potential benefits for VAEs through low-dimensional examples, before presenting quantitative and qualitative results. Our experiments demonstrate that skewing our variant of JSGĪ±\textrm{JS}^{\textrm{G}_{\alpha}}, in the context of JSGĪ±\textrm{JS}^{\textrm{G}_{\alpha}}-VAEs, leads to better reconstruction and generation when compared to several baseline VAEs. Our approach is entirely unsupervised and utilises only one hyperparameter which can be easily interpreted in latent space.Comment: Camera-ready version, accepted at NeurIPS 202

    Exploring probabilistic models for semi-supervised learning

    Get PDF
    Deep neural networks are increasingly harnessed for computer vision tasks, thanks to their robust performance. However, their training demands large-scale labeled datasets, which are labor-intensive to prepare. Semi-supervised learning (SSL) offers a solution by learning from a mix of labeled and unlabeled data. While most state-of-the-art SSL methods follow a deterministic approach, the exploration of their probabilistic counterparts remains limited. This research area is important because probabilistic models can provide uncertainty estimates critical for real-world applications. For instance, SSL-trained models may fall short of those trained with supervised learning due to potential pseudo-label errors in unlabeled data, and these models are more likely to make wrong predictions in practice. Especially in critical sectors like medical image analysis and autonomous driving, decision-makers must understand the modelā€™s limitations and when incorrect predictions may occur, insights often provided by uncertainty estimates. Furthermore, uncertainty can also serve as a criterion for filtering out unreliable pseudo-labels when unlabeled samples are used for training, potentially improving deep model performance. This thesis furthers the exploration of probabilistic models for SSL. Drawing on the widely-used Bayesian approximation tool, Monte Carlo (MC) dropout, I propose a new probabilistic framework, the Generative Bayesian Deep Learning (GBDL) architecture, for semi-supervised medical image segmentation. This approach not only mitigates potential overfitting found in previous methods but also achieves superior results across four evaluation metrics. Unlike its empirically designed predecessors, GBDL is underpinned by a full Bayesian formulation, providing a theoretical probabilistic foundation. Acknowledging MC dropoutā€™s limitations, I introduce NP-Match, a novel proba- bilistic approach for large-scale semi-supervised image classification. We evaluated NP-Matchā€™s generalization capabilities through extensive experiments in different challenging settings such as standard, imbalanced, and multi-label semi-supervised image classification. According to the experimental results, NP-Match not only competes favorably with previous state-of-the-art methods but also estimates uncertainty more rapidly than MC-dropout-based models, thus enhancing both training and testing efficiency. Lastly, I propose NP-SemiSeg, a new probabilistic model for semi-supervised se- mantic segmentation. This flexible model can be integrated with various existing segmentation frameworks to make predictions and estimate uncertainty. Experiments indicate that NP-SemiSeg surpasses MC dropout in accuracy, uncertainty quantification, and speed

    Survey of Social Bias in Vision-Language Models

    Full text link
    In recent years, the rapid advancement of machine learning (ML) models, particularly transformer-based pre-trained models, has revolutionized Natural Language Processing (NLP) and Computer Vision (CV) fields. However, researchers have discovered that these models can inadvertently capture and reinforce social biases present in their training datasets, leading to potential social harms, such as uneven resource allocation and unfair representation of specific social groups. Addressing these biases and ensuring fairness in artificial intelligence (AI) systems has become a critical concern in the ML community. The recent introduction of pre-trained vision-and-language (VL) models in the emerging multimodal field demands attention to the potential social biases present in these models as well. Although VL models are susceptible to social bias, there is a limited understanding compared to the extensive discussions on bias in NLP and CV. This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL. By examining these perspectives, the survey aims to offer valuable guidelines on how to approach and mitigate social bias in both unimodal and multimodal settings. The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models in various applications and research endeavors

    Cerebellum and Ocular Motor Control

    Get PDF
    An intact cerebellum is a prerequisite for optimal ocular motor performance. The cerebellum fine-tunes each of the subtypes of eye movements so they work together to bring and maintain images of objects of interest on the fovea. Here we review the major aspects of the contribution of the cerebellum to ocular motor control. The approach will be based on structuralā€“functional correlation, combining the effects of lesions and the results from physiologic studies, with the emphasis on the cerebellar regions known to be most closely related to ocular motor function: (1) the flocculus/paraflocculus for high-frequency (brief) vestibular responses, sustained pursuit eye movements, and gaze holding, (2) the nodulus/ventral uvula for low-frequency (sustained) vestibular responses, and (3) the dorsal oculomotor vermis and its target in the posterior portion of the fastigial nucleus (the fastigial oculomotor region) for saccades and pursuit initiation

    Generative models for natural images

    Full text link
    Nous traitons de modeĢ€les geĢneĢratifs construits avec des reĢseaux de neurones dans le contexte de la modeĢlisation dā€™images. De nos jours, trois types de modeĢ€les sont particulieĢ€rement preĢdominants: les modeĢ€les aĢ€ variables latentes, tel que lā€™auto-encodeur variationnel (VAE), les modeĢ€les autoreĢgressifs, tel que le reĢseau de neurones reĢcurrent pixel (PixelRNN), et les modeĢ€les geĢneĢratifs antagonistes (GANs), qui sont des modeĢ€les aĢ€ transformation de bruit entraineĢs aĢ€ lā€™aide dā€™un adversaire. Cette theĢ€se traite de chacun de ces modeĢ€les. Le premier chapitre couvre la base des modeĢ€les geĢneĢratifs, ainsi que les reĢseaux de neurones pro- fonds, qui constituent la technologie principalement utiliseĢe aĢ€ lā€™heure actuelle pour lā€™impleĢmentation de modeĢ€les statistiques puissants. Dans le deuxieĢ€me chapitre, nous impleĢmentons un auto-encodeur variationnel avec un deĢcodeur auto-reĢgressif. Cela permet de se libeĢrer de lā€™hypotheĢ€se dā€™indeĢpendance des dimensions de sortie du deĢcodeur variationnel, en modeĢlisant une distribution jointe tracĢ§able aĢ€ la place, et de doter le modeĢ€le auto-reĢgressif dā€™un code latent. De plus, notre impleĢmentation a un couĢ‚t computationnel significativement reĢduit, si on le compare aĢ€ un modeĢ€le purement auto-reĢgressif ayant les meĢ‚mes hypotheĢ€ses de modeĢlisation et la meĢ‚me performance. Nous deĢcrivons lā€™espace latent de facĢ§on hieĢrarchique, et montrons de manieĢ€re qualitative la deĢcomposition seĢmantique des causes latente induites par ce design. Finalement, nous preĢsentons des reĢsultats obtenus avec des jeux de donneĢes standards et deĢmontrant que la performance de notre impleĢmentation est fortement compeĢtitive. Dans le troisieĢ€me chapitre, nous preĢsentons une proceĢdure dā€™entrainement ameĢlioreĢe pour une variante reĢcente de modeĢ€les geĢneĢratifs antagoniste. Le Ā«Wasserstein GANĀ» minimise la distance, mesureĢe avec la meĢtrique de Wasserstein, entre la distribution reĢelle et celle geĢneĢreĢe par le modeĢ€le, ce qui le rend plus facile aĢ€ entrainer quā€™un GAN avec un objectif minimax. Cependant, en fonction des parameĢ€tres, il preĢsente toujours des cas dā€™eĢchecs avec certain modes dā€™entrainement. Nous avons deĢcouvert que le coupable est le coupage des poids, et nous le remplacĢ§ons par une peĢnaliteĢ sur la norme des gradients. Ceci ameĢliore et stabilise lā€™entrainement, et ce sur diffeĢrents types du parameĢ€tres (incluant des modeĢ€les de langue sur des donneĢes discreĢ€tes), et permet de geĢneĢrer des eĢchantillons de haute qualiteĢs sur CIFAR-10 et LSUN bedrooms. Finalement, dans le quatrieĢ€me chapitre, nous consideĢrons lā€™usage de modeĢ€les geĢneĢratifs modernes comme modeĢ€les de normaliteĢ dans un cadre de deĢtection hors-distribution Ā«zero-shotĀ». Nous avons eĢvalueĢ certains des modeĢ€les preĢceĢdemment preĢsenteĢs dans la theĢ€se, et avons trouveĢ que les VAEs sont les plus prometteurs, bien que leurs performances laissent encore un large place aĢ€ lā€™ameĢlioration. Cette partie de la theĢ€se constitue un travail en cours. Nous concluons en reĢpeĢtant lā€™importance des modeĢ€les geĢneĢratifs dans le deĢveloppement de lā€™intelligence artificielle et mentionnons quelques deĢfis futurs.We discuss modern generative modelling of natural images based on neural networks. Three varieties of such models are particularly predominant at the time of writing: latent variable models such as variational autoencoders (VAE), autoregressive models such as pixel recurrent neural networks (PixelRNN), and generative adversarial networks (GAN), which are noise-transformation models trained with an adversary. This thesis touches on all three kinds. The first chapter covers background on generative models, along with relevant discussions about deep neural networks, which are currently the dominant technology for implementing powerful statistical models. In the second chapter, we implement variational autoencoders with autoregressive decoders. This removes the strong assumption of output dimensions being conditionally independent in variational autoencoders, instead tractably modelling a joint distribution, while also endowing autoregressive models with a latent code. Additionally, this model has significantly reduced computational cost compared to that of a purely autoregressive model with similar modelling assumptions and performance. We express the latent space as a hierarchy, and qualitatively demonstrate the semantic decomposition of latent causes induced by this design. Finally, we present results on standard datasets that demonstrate strongly competitive performance. In the third chapter, we present an improved training procedure for a recent variant on generative adversarial networks. Wasserstein GANs minimize the Earth-Moverā€™s distance between the real and generated distributions and have been shown to be much easier to train than with the standard minimax objective of GANs. However, they still exhibit some failure modes in training for some settings. We identify weight clipping as a culprit and replace it with a penalty on the gradient norm. This improves training further, and we demonstrate stability on a wide variety of settings (including language models over discrete data), and samples of high quality on the CIFAR-10 and LSUN bedrooms datasets. Finally, in the fourth chapter, we present work in development, where we consider the use of modern generative models as normality models in a zero-shot out-of-distribution detection setting. We evaluate some of the models we have discussed previously in the thesis, and find that VAEs are the most promising, although their overall performance leaves a lot of room for improvement. We conclude by reiterating the significance of generative modelling in the development of artificial intelligence, and mention some of the challenges ahead
    • ā€¦
    corecore