441 research outputs found
Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings
Current meta-learning approaches focus on learning functional representations
of relationships between variables, i.e. on estimating conditional expectations
in regression. In many applications, however, we are faced with conditional
distributions which cannot be meaningfully summarized using expectation only
(due to e.g. multimodality). Hence, we consider the problem of conditional
density estimation in the meta-learning setting. We introduce a novel technique
for meta-learning which combines neural representation and noise-contrastive
estimation with the established literature of conditional mean embeddings into
reproducing kernel Hilbert spaces. The method is validated on synthetic and
real-world problems, demonstrating the utility of sharing learned
representations across multiple conditional density estimation tasks
Exponential Family Estimation via Adversarial Dynamics Embedding
We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks. We exploit the primal-dual view of the MLE with a kinetics augmented model to obtain an estimate associated with an adversarial dual sampler. To represent this sampler, we introduce a novel neural architecture, dynamics embedding, that generalizes Hamiltonian Monte-Carlo (HMC). The proposed approach inherits the flexibility of HMC while enabling tractable entropy estimation for the augmented model. By learning both a dual sampler and the primal model simultaneously, and sharing parameters between them, we obviate the requirement to design a separate sampling procedure once the model has been trained, leading to more effective learning. We show that many existing estimators, such as contrastive divergence, pseudo/composite-likelihood, score matching, minimum Stein discrepancy estimator, non-local contrastive objectives, noise-contrastive estimation, and minimum probability flow, are special cases of the proposed approach, each expressed by a different (fixed) dual sampler. An empirical investigation shows that adapting the sampler during MLE can significantly improve on state-of-the-art estimators
Uncertainty Estimation, Explanation and Reduction with Insufficient Data
Human beings have been juggling making smart decisions under uncertainties, where we manage to trade off between swift actions and collecting sufficient evidence. It is naturally expected that a generalized artificial intelligence (GAI) to navigate through uncertainties meanwhile predicting precisely. In this thesis, we aim to propose strategies that underpin machine learning with uncertainties from three perspectives: uncertainty estimation, explanation and reduction. Estimation quantifies the variability in the model inputs and outputs. It can endow us to evaluate the model predictive confidence. Explanation provides a tool to interpret the mechanism of uncertainties and to pinpoint the potentials for uncertainty reduction, which focuses on stabilizing model training, especially when the data is insufficient. We hope that this thesis can motivate related studies on quantifying predictive uncertainties in deep learning. It also aims to raise awareness for other stakeholders in the fields of smart transportation and automated medical diagnosis where data insufficiency induces high uncertainty.
The thesis is dissected into the following sections: Introduction. we justify the necessity to investigate AI uncertainties and clarify the challenges existed in the latest studies, followed by our research objective. Literature review. We break down the the review of the state-of-the-art methods into uncertainty estimation, explanation and reduction. We make comparisons with the related fields encompassing meta learning, anomaly detection, continual learning as well. Uncertainty estimation. We introduce a variational framework, neural process that approximates Gaussian processes to handle uncertainty estimation. Two variants from the neural process families are proposed to enhance neural processes with scalability and continual learning. Uncertainty explanation. We inspect the functional distribution of neural processes to discover the global and local factors that affect the degree of predictive uncertainties. Uncertainty reduction. We validate the proposed uncertainty framework on two scenarios: urban irregular behaviour detection and neurological disorder diagnosis, where the intrinsic data insufficiency undermines the performance of existing deep learning models. Conclusion. We provide promising directions for future works and conclude the thesis
Unsupervised approaches for time-evolving graph embeddings with application to human microbiome
More and more diseases have been found to be strongly correlated with disturbances in the microbiome constitution, e.g., obesity, diabetes, and even some types of cancer. Advances in high-throughput omics technologies have made it possible to directly analyze the human microbiome and its impact on human health and physiology. Microbial composition is usually observed over long periods of time and the interactions between their members are explored. Numerous studies have used microbiome data to accurately differentiate disease states and understand the differences in microbiome profiles between healthy and ill individuals. However, most of them mainly focus on various statistical approaches, omitting microbe-microbe interactions among a large number of microbiome species that, in principle, drive microbiome dynamics. Constructing and analyzing time-evolving graphs is needed to understand how microbial ecosystems respond to a range of distinct perturbations, such as antibiotic exposure, diseases, or other general dynamic properties. This becomes especially challenging due to dozens of complex interactions among microbes and metastable dynamics.
The key to addressing this challenge lies in representing time-evolving graphs constructed from microbiome data as fixed-length, low-dimensional feature vectors that preserve the original dynamics. Therefore, we propose two unsupervised approaches that map the time-evolving graph constructed from microbiome data into a low-dimensional space where the initial dynamic, such as the number of metastable states and their locations, is preserved. The first method relies on the spectral analysis of transfer operators, such as the Perron--Frobenius or Koopman operator, and graph kernels. These components enable us to extract topological information such as complex interactions of species from the time-evolving graph and take into account the dynamic changes in the human microbiome composition. Further, we study how deep learning techniques can contribute to the study of a complex network of microbial species. The method consists of two key components: 1) the Transformer, the state-of-the-art architecture used in the sequential data, that learns both structural patterns of the time-evolving graph and temporal changes of the microbiome system and 2) contrastive learning that allows the model to learn the low-dimensional representation while maintaining metastability in a low-dimensional space.
Finally, this thesis will address an important challenge in microbiome data, specifically identifying which species or interactions of species are responsible for or affected by the changes that the microbiome undergoes from one state (healthy) to another state (diseased or antibiotic exposure). Using interpretability techniques of deep learning models, which, at the outset, have been used as methods to prove the trustworthiness of a deep learning model, we can extract structural information of the time-evolving graph pertaining to particular metastable states
Contrastive Value Learning: Implicit Models for Simple Offline RL
Model-based reinforcement learning (RL) methods are appealing in the offline
setting because they allow an agent to reason about the consequences of actions
without interacting with the environment. Prior methods learn a 1-step dynamics
model, which predicts the next state given the current state and action. These
models do not immediately tell the agent which actions to take, but must be
integrated into a larger RL framework. Can we model the environment dynamics in
a different way, such that the learned model does directly indicate the value
of each action? In this paper, we propose Contrastive Value Learning (CVL),
which learns an implicit, multi-step model of the environment dynamics. This
model can be learned without access to reward functions, but nonetheless can be
used to directly estimate the value of each action, without requiring any TD
learning. Because this model represents the multi-step transitions implicitly,
it avoids having to predict high-dimensional observations and thus scales to
high-dimensional tasks. Our experiments demonstrate that CVL outperforms prior
offline RL methods on complex continuous control benchmarks.Comment: Deep Reinforcement Learning Workshop, NeurIPS 202
Diffusion-based Neural Network Weights Generation
Transfer learning is a topic of significant interest in recent deep learning
research because it enables faster convergence and improved performance on new
tasks. While the performance of transfer learning depends on the similarity of
the source data to the target data, it is costly to train a model on a large
number of datasets. Therefore, pretrained models are generally blindly selected
with the hope that they will achieve good performance on the given task. To
tackle such suboptimality of the pretrained models, we propose an efficient and
adaptive transfer learning scheme through dataset-conditioned pretrained
weights sampling. Specifically, we use a latent diffusion model with a
variational autoencoder that can reconstruct the neural network weights, to
learn the distribution of a set of pretrained weights conditioned on each
dataset for transfer learning on unseen datasets. By learning the distribution
of a neural network on a variety pretrained models, our approach enables
adaptive sampling weights for unseen datasets achieving faster convergence and
reaching competitive performance.Comment: 14 page
A Comprehensive Survey on Deep Graph Representation Learning
Graph representation learning aims to effectively encode high-dimensional
sparse graph-structured data into low-dimensional dense vectors, which is a
fundamental task that has been widely studied in a range of fields, including
machine learning and data mining. Classic graph embedding methods follow the
basic idea that the embedding vectors of interconnected nodes in the graph can
still maintain a relatively close distance, thereby preserving the structural
information between the nodes in the graph. However, this is sub-optimal due
to: (i) traditional methods have limited model capacity which limits the
learning performance; (ii) existing techniques typically rely on unsupervised
learning strategies and fail to couple with the latest learning paradigms;
(iii) representation learning and downstream tasks are dependent on each other
which should be jointly enhanced. With the remarkable success of deep learning,
deep graph representation learning has shown great potential and advantages
over shallow (traditional) methods, there exist a large number of deep graph
representation learning techniques have been proposed in the past decade,
especially graph neural networks. In this survey, we conduct a comprehensive
survey on current deep graph representation learning algorithms by proposing a
new taxonomy of existing state-of-the-art literature. Specifically, we
systematically summarize the essential components of graph representation
learning and categorize existing approaches by the ways of graph neural network
architectures and the most recent advanced learning paradigms. Moreover, this
survey also provides the practical and promising applications of deep graph
representation learning. Last but not least, we state new perspectives and
suggest challenging directions which deserve further investigations in the
future
Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch
Unsupervised contrastive learning methods have recently seen significant
improvements, particularly through data augmentation strategies that aim to
produce robust and generalizable representations. However, prevailing data
augmentation methods, whether hand designed or based on foundation models, tend
to rely heavily on prior knowledge or external data. This dependence often
compromises their effectiveness and efficiency. Furthermore, the applicability
of most existing data augmentation strategies is limited when transitioning to
other research domains, especially science-related data. This limitation stems
from the paucity of prior knowledge and labeled data available in these
domains. To address these challenges, we introduce DiffAug-a novel and
efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure
that the augmented and original data share a smoothed latent space, which is
achieved through diffusion steps. Uniquely, unlike traditional methods, DiffAug
first mines sufficient prior semantic knowledge about the neighborhood. This
provides a constraint to guide the diffusion steps, eliminating the need for
labels, external data/models, or prior knowledge. Designed as an
architecture-agnostic framework, DiffAug provides consistent improvements.
Specifically, it improves image classification and clustering accuracy by
1.6%~4.5%. When applied to biological data, DiffAug improves performance by up
to 10.1%, with an average improvement of 5.8%. DiffAug shows good performance
in both vision and biological domains.Comment: arXiv admin note: text overlap with arXiv:2302.07944 by other author
- …