60 research outputs found

    Variational Inference for Text Generation: Improving the Posterior

    Get PDF
    Learning useful representations of data is a crucial task in machine learning with wide ranging applications. In this thesis we explore improving representations of models based on variational inference by improving the posterior. We explore two approaches towards this goal: 1) auxiliary losses to regularize the latent space and enforcing desired properties and 2) normalizing flows to develop more flexible posteriors to be used during variational inference. We propose a proximity based loss function that helps in disentanglement by regularizing the latent space based on similarity according to a criterion. We evaluate our model on a task of disentangling semantics and syntax in sentences and empirically show that our model successfully manages to learn independent subspaces that learn semantics and syntax respectively. We compare our model to existing approaches using automated metrics and human evaluation to show that our model is competitive. We also explore the effectiveness of normalizing flows for representation learning and generative modeling. We perform experiments that empirically show that variational inference with normalizing flows beats standard approaches based on simple posteriors across various metrics in text generation and language modeling. We also propose a variant of planar normalizing flows called block planar normalizing flows for use in disentanglement tasks. We perform ablation experiments to empirically show that our proposed block planar flows help in improving disentanglement

    Hyperbolic Deep Neural Networks: A Survey

    Full text link
    Recently, there has been a rising surge of momentum for deep representation learning in hyperbolic spaces due to theirhigh capacity of modeling data like knowledge graphs or synonym hierarchies, possessing hierarchical structure. We refer to the model as hyperbolic deep neural network in this paper. Such a hyperbolic neural architecture potentially leads to drastically compact model withmuch more physical interpretability than its counterpart in Euclidean space. To stimulate future research, this paper presents acoherent and comprehensive review of the literature around the neural components in the construction of hyperbolic deep neuralnetworks, as well as the generalization of the leading deep approaches to the Hyperbolic space. It also presents current applicationsaround various machine learning tasks on several publicly available datasets, together with insightful observations and identifying openquestions and promising future directions

    Graph Priors, Optimal Transport, and Deep Learning in Biomedical Discovery

    Get PDF
    Recent advances in biomedical data collection allows the collection of massive datasets measuring thousands of features in thousands to millions of individual cells. This data has the potential to advance our understanding of biological mechanisms at a previously impossible resolution. However, there are few methods to understand data of this scale and type. While neural networks have made tremendous progress on supervised learning problems, there is still much work to be done in making them useful for discovery in data with more difficult to represent supervision. The flexibility and expressiveness of neural networks is sometimes a hindrance in these less supervised domains, as is the case when extracting knowledge from biomedical data. One type of prior knowledge that is more common in biological data comes in the form of geometric constraints. In this thesis, we aim to leverage this geometric knowledge to create scalable and interpretable models to understand this data. Encoding geometric priors into neural network and graph models allows us to characterize the models’ solutions as they relate to the fields of graph signal processing and optimal transport. These links allow us to understand and interpret this datatype. We divide this work into three sections. The first borrows concepts from graph signal processing to construct more interpretable and performant neural networks by constraining and structuring the architecture. The second borrows from the theory of optimal transport to perform anomaly detection and trajectory inference efficiently and with theoretical guarantees. The third examines how to compare distributions over an underlying manifold, which can be used to understand how different perturbations or conditions relate. For this we design an efficient approximation of optimal transport based on diffusion over a joint cell graph. Together, these works utilize our prior understanding of the data geometry to create more useful models of the data. We apply these methods to molecular graphs, images, single-cell sequencing, and health record data
    • …
    corecore