4 research outputs found

    Exploring discrete representations in stochastic computation graphs: Challenges, benefits, and novel strategies

    Full text link
    The evolution of deep learning has led to a need for models with enhanced interpretability and generalization behaviors. As part of this, discrete representations play a significant role since they tend to be more interpretable. This thesis explores discrete representations in Stochastic Computation Graphs (SCGs), focusing on challenges, benefits, and novel strategies for their structure and parameter learning. Recent successes in model-based reinforcement learning and text-to-image generation have demonstrated the empirical advantages of discrete latent representations. However, the reasons behind their benefits remain unclear. Furthermore, training deep learning models with discrete representations presents unique problems, primarily associated with differentiating through probability distributions. In response, we establish a background as a solid foundation for our research, focusing on SCGs. We then analyze the challenges associated with training models with discrete representations and their benefits. In addition, we propose novel strategies to address these challenges, which we evaluate experimentally across various domains. On the one hand, we propose learning the structure of computation graphs for efficient Neural Architecture Search. On the other hand, we propose altering the scale parameter of Gumbel noise perturbations and implementing dropout residual connections for efficient parameter learning of discrete SCGs. Furthermore, we present a new approach of employing a categorical Variational Autoencoder to enhance disentanglement. Our extensive experimental evaluations across diverse domains demonstrate the effectiveness of the proposed methods. We find that the challenges associated with training discrete representations can be significantly mitigated, and our strategies help to improve the models’ interpretability and generalization behavior. Our findings also reveal the inherent grid structure of categorical distributions as an efficient inductive prior for disentangled representations. This study provides critical insights into discrete representations in deep learning, extending our understanding and proposing novel methods that show promising results in experimental evaluations. Our work highlights promising future work for further refinement of discrete representations and their diverse applications

    Advancing and Leveraging Tractable Likelihood Models

    Get PDF
    The past decade has seen a remarkable improvement in a variety of machine learning applications thanks to numerous advances in deep neural networks (DNN). These models are now the de facto standard in fields ranging from image/speech recognition to driverless cars and have begun to permeate aspects of modern science and everyday life. The deep learning revolution has also resulted in highly effective generative models such as score matching models, diffusion models, VAEs, GANs, and tractable likelihood models. These models are best known for their ability to create novel samples of impressive quality but are usually limited to highly structured data modalities. Expanding the capabilities and applications of likelihood models beyond conventional data formats and generative applications can increase functionality, interpretability, and intuition compared to conventional methods. This dissertation addresses shortcomings in likelihood models over less structured data and explores methods to exploit a learned density as part of a larger application. We begin by advancing the performance of likelihood models outside the standard, ordered data regime by developing methods that are applicable to sets, e.g., point clouds. Many data sources contain instances that are a collection of unordered points, such as points on the surface of scans from human organs, sets of images from a web page, or LiDAR observations commonly used in driverless cars or (hyper-spectral) aerial surveys.We then explore several applications of density models. First, we consider generative process over neural networks themselves and show that training over ensembles of these sampled models can lead to improved robustness to adversarial attacks. Next, we demonstrate how to use the transformative portion of a normalizing flow as a feature extractor in conjunction with a downstream task to estimate expectations over model performance in local and global regions.Finally, we propose a learnable, continuous parameterization of mixture models directly on the input space to improve model interpretability while simultaneously allowing for arbitrary marginalization or conditioning without the need to train new models or develop complex masking mechanisms.Doctor of Philosoph
    corecore