441 research outputs found

    Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings

    Full text link
    Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. on estimating conditional expectations in regression. In many applications, however, we are faced with conditional distributions which cannot be meaningfully summarized using expectation only (due to e.g. multimodality). Hence, we consider the problem of conditional density estimation in the meta-learning setting. We introduce a novel technique for meta-learning which combines neural representation and noise-contrastive estimation with the established literature of conditional mean embeddings into reproducing kernel Hilbert spaces. The method is validated on synthetic and real-world problems, demonstrating the utility of sharing learned representations across multiple conditional density estimation tasks

    Exponential Family Estimation via Adversarial Dynamics Embedding

    Get PDF
    We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks. We exploit the primal-dual view of the MLE with a kinetics augmented model to obtain an estimate associated with an adversarial dual sampler. To represent this sampler, we introduce a novel neural architecture, dynamics embedding, that generalizes Hamiltonian Monte-Carlo (HMC). The proposed approach inherits the flexibility of HMC while enabling tractable entropy estimation for the augmented model. By learning both a dual sampler and the primal model simultaneously, and sharing parameters between them, we obviate the requirement to design a separate sampling procedure once the model has been trained, leading to more effective learning. We show that many existing estimators, such as contrastive divergence, pseudo/composite-likelihood, score matching, minimum Stein discrepancy estimator, non-local contrastive objectives, noise-contrastive estimation, and minimum probability flow, are special cases of the proposed approach, each expressed by a different (fixed) dual sampler. An empirical investigation shows that adapting the sampler during MLE can significantly improve on state-of-the-art estimators

    Uncertainty Estimation, Explanation and Reduction with Insufficient Data

    Full text link
    Human beings have been juggling making smart decisions under uncertainties, where we manage to trade off between swift actions and collecting sufficient evidence. It is naturally expected that a generalized artificial intelligence (GAI) to navigate through uncertainties meanwhile predicting precisely. In this thesis, we aim to propose strategies that underpin machine learning with uncertainties from three perspectives: uncertainty estimation, explanation and reduction. Estimation quantifies the variability in the model inputs and outputs. It can endow us to evaluate the model predictive confidence. Explanation provides a tool to interpret the mechanism of uncertainties and to pinpoint the potentials for uncertainty reduction, which focuses on stabilizing model training, especially when the data is insufficient. We hope that this thesis can motivate related studies on quantifying predictive uncertainties in deep learning. It also aims to raise awareness for other stakeholders in the fields of smart transportation and automated medical diagnosis where data insufficiency induces high uncertainty. The thesis is dissected into the following sections: Introduction. we justify the necessity to investigate AI uncertainties and clarify the challenges existed in the latest studies, followed by our research objective. Literature review. We break down the the review of the state-of-the-art methods into uncertainty estimation, explanation and reduction. We make comparisons with the related fields encompassing meta learning, anomaly detection, continual learning as well. Uncertainty estimation. We introduce a variational framework, neural process that approximates Gaussian processes to handle uncertainty estimation. Two variants from the neural process families are proposed to enhance neural processes with scalability and continual learning. Uncertainty explanation. We inspect the functional distribution of neural processes to discover the global and local factors that affect the degree of predictive uncertainties. Uncertainty reduction. We validate the proposed uncertainty framework on two scenarios: urban irregular behaviour detection and neurological disorder diagnosis, where the intrinsic data insufficiency undermines the performance of existing deep learning models. Conclusion. We provide promising directions for future works and conclude the thesis

    Unsupervised approaches for time-evolving graph embeddings with application to human microbiome

    Get PDF
    More and more diseases have been found to be strongly correlated with disturbances in the microbiome constitution, e.g., obesity, diabetes, and even some types of cancer. Advances in high-throughput omics technologies have made it possible to directly analyze the human microbiome and its impact on human health and physiology. Microbial composition is usually observed over long periods of time and the interactions between their members are explored. Numerous studies have used microbiome data to accurately differentiate disease states and understand the differences in microbiome profiles between healthy and ill individuals. However, most of them mainly focus on various statistical approaches, omitting microbe-microbe interactions among a large number of microbiome species that, in principle, drive microbiome dynamics. Constructing and analyzing time-evolving graphs is needed to understand how microbial ecosystems respond to a range of distinct perturbations, such as antibiotic exposure, diseases, or other general dynamic properties. This becomes especially challenging due to dozens of complex interactions among microbes and metastable dynamics. The key to addressing this challenge lies in representing time-evolving graphs constructed from microbiome data as fixed-length, low-dimensional feature vectors that preserve the original dynamics. Therefore, we propose two unsupervised approaches that map the time-evolving graph constructed from microbiome data into a low-dimensional space where the initial dynamic, such as the number of metastable states and their locations, is preserved. The first method relies on the spectral analysis of transfer operators, such as the Perron--Frobenius or Koopman operator, and graph kernels. These components enable us to extract topological information such as complex interactions of species from the time-evolving graph and take into account the dynamic changes in the human microbiome composition. Further, we study how deep learning techniques can contribute to the study of a complex network of microbial species. The method consists of two key components: 1) the Transformer, the state-of-the-art architecture used in the sequential data, that learns both structural patterns of the time-evolving graph and temporal changes of the microbiome system and 2) contrastive learning that allows the model to learn the low-dimensional representation while maintaining metastability in a low-dimensional space. Finally, this thesis will address an important challenge in microbiome data, specifically identifying which species or interactions of species are responsible for or affected by the changes that the microbiome undergoes from one state (healthy) to another state (diseased or antibiotic exposure). Using interpretability techniques of deep learning models, which, at the outset, have been used as methods to prove the trustworthiness of a deep learning model, we can extract structural information of the time-evolving graph pertaining to particular metastable states

    Contrastive Value Learning: Implicit Models for Simple Offline RL

    Full text link
    Model-based reinforcement learning (RL) methods are appealing in the offline setting because they allow an agent to reason about the consequences of actions without interacting with the environment. Prior methods learn a 1-step dynamics model, which predicts the next state given the current state and action. These models do not immediately tell the agent which actions to take, but must be integrated into a larger RL framework. Can we model the environment dynamics in a different way, such that the learned model does directly indicate the value of each action? In this paper, we propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics. This model can be learned without access to reward functions, but nonetheless can be used to directly estimate the value of each action, without requiring any TD learning. Because this model represents the multi-step transitions implicitly, it avoids having to predict high-dimensional observations and thus scales to high-dimensional tasks. Our experiments demonstrate that CVL outperforms prior offline RL methods on complex continuous control benchmarks.Comment: Deep Reinforcement Learning Workshop, NeurIPS 202

    Diffusion-based Neural Network Weights Generation

    Full text link
    Transfer learning is a topic of significant interest in recent deep learning research because it enables faster convergence and improved performance on new tasks. While the performance of transfer learning depends on the similarity of the source data to the target data, it is costly to train a model on a large number of datasets. Therefore, pretrained models are generally blindly selected with the hope that they will achieve good performance on the given task. To tackle such suboptimality of the pretrained models, we propose an efficient and adaptive transfer learning scheme through dataset-conditioned pretrained weights sampling. Specifically, we use a latent diffusion model with a variational autoencoder that can reconstruct the neural network weights, to learn the distribution of a set of pretrained weights conditioned on each dataset for transfer learning on unseen datasets. By learning the distribution of a neural network on a variety pretrained models, our approach enables adaptive sampling weights for unseen datasets achieving faster convergence and reaching competitive performance.Comment: 14 page

    A Comprehensive Survey on Deep Graph Representation Learning

    Full text link
    Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future

    Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch

    Full text link
    Unsupervised contrastive learning methods have recently seen significant improvements, particularly through data augmentation strategies that aim to produce robust and generalizable representations. However, prevailing data augmentation methods, whether hand designed or based on foundation models, tend to rely heavily on prior knowledge or external data. This dependence often compromises their effectiveness and efficiency. Furthermore, the applicability of most existing data augmentation strategies is limited when transitioning to other research domains, especially science-related data. This limitation stems from the paucity of prior knowledge and labeled data available in these domains. To address these challenges, we introduce DiffAug-a novel and efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure that the augmented and original data share a smoothed latent space, which is achieved through diffusion steps. Uniquely, unlike traditional methods, DiffAug first mines sufficient prior semantic knowledge about the neighborhood. This provides a constraint to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge. Designed as an architecture-agnostic framework, DiffAug provides consistent improvements. Specifically, it improves image classification and clustering accuracy by 1.6%~4.5%. When applied to biological data, DiffAug improves performance by up to 10.1%, with an average improvement of 5.8%. DiffAug shows good performance in both vision and biological domains.Comment: arXiv admin note: text overlap with arXiv:2302.07944 by other author
    • …
    corecore