17 research outputs found
Exploiting Inferential Structure in Neural Processes
Neural Processes (NPs) are appealing due to their ability to perform fast adaptation based on a context set. This set is encoded by a latent variable, which is often assumed to follow a simple distribution. However, in real-word settings, the context set may be drawn from richer distributions having multiple modes, heavy tails, etc. In this work, we provide a framework that allows NPs’ latent variable to be given a rich prior defined by a graphical model. These distributional assumptions directly translate into an appropriate aggregation strategy for the context set. Moreover, we describe a message-passing procedure that still allows for end-to-end optimization with stochastic gradients. We demonstrate the generality of our framework by using mixture and Student-t assumptions that yield improvements in function modelling and test-time robustness
Recommended from our members
Predictive Complexity Priors
Specifying a Bayesian prior is notoriously difficult for complex models such as neural networks. Reasoning about parameters is made challenging by the high-dimensionality and over-parameterization of the space. Priors that seem benign and uninformative can have unintuitive and detrimental effects on a model's predictions. For this reason, we propose predictive complexity priors: a functional prior that is defined by comparing the model's predictions to those of a reference model. Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables. The traditional Bayesian workflow can then proceed as usual. We apply our predictive complexity prior to high-dimensional regression, reasoning over neural network depth, and sharing of statistical strength for few-shot learning
Bayesian batch active learning as sparse subset approximation
Leveraging the wealth of unlabeled data produced in recent years provides
great potential for improving supervised models. When the cost of acquiring
labels is high, probabilistic active learning methods can be used to greedily
select the most informative data points to be labeled. However, for many
large-scale problems standard greedy procedures become computationally
infeasible and suffer from negligible model change. In this paper, we introduce
a novel Bayesian batch active learning approach that mitigates these issues.
Our approach is motivated by approximating the complete data posterior of the
model parameters. While naive batch construction methods result in correlated
queries, our algorithm produces diverse batches that enable efficient active
learning at scale. We derive interpretable closed-form solutions akin to
existing active learning procedures for linear models, and generalize to
arbitrary models using random projections. We demonstrate the benefits of our
approach on several large-scale regression and classification tasks.Comment: NeurIPS 201
On the impact of non-IID data on the performance and fairness of differentially private federated learning
Federated Learning enables distributed data holders to train a shared machine learning model on their collective data. It provides some measure of privacy by not requiring the data be pooled and centralized but still has been shown to be vulnerable to adversarial attacks. Differential Privacy provides rigorous guarantees and sufficient protection against adversarial attacks and has been widely employed in recent years to perform privacy preserving machine learning. One common trait in many of recent methods on federated learning and federated differentially private learning is the assumption of IID data, which in real world scenarios most certainly does not hold true. In this work, we empirically investigate the effect of non-IID data on node level on federated, differentially private, deep learning. We show the non-IID data to have a negative impact on both performance and fairness of the trained model and discuss the trade off between privacy, utility and fairness. Our results highlight the limits of common federated learning algorithms in a differentially private setting to provide robust, reliable results across underrepresented groups. </p
Enhancing VAEs for Collaborative Filtering: Flexible Priors & Gating Mechanisms
Neural network based models for collaborative filtering have started to gain
attention recently. One branch of research is based on using deep generative
models to model user preferences where variational autoencoders were shown to
produce state-of-the-art results. However, there are some potentially
problematic characteristics of the current variational autoencoder for CF. The
first is the too simplistic prior that VAEs incorporate for learning the latent
representations of user preference. The other is the model's inability to learn
deeper representations with more than one hidden layer for each network. Our
goal is to incorporate appropriate techniques to mitigate the aforementioned
problems of variational autoencoder CF and further improve the recommendation
performance. Our work is the first to apply flexible priors to collaborative
filtering and show that simple priors (in original VAEs) may be too restrictive
to fully model user preferences and setting a more flexible prior gives
significant gains. We experiment with the VampPrior, originally proposed for
image generation, to examine the effect of flexible priors in CF. We also show
that VampPriors coupled with gating mechanisms outperform SOTA results
including the Variational Autoencoder for Collaborative Filtering by meaningful
margins on 2 popular benchmark datasets (MovieLens & Netflix)
Calibrated Learning to Defer with One-vs-All Classifiers
The learning to defer (L2D) framework has the potential to make AI systems
safer. For a given input, the system can defer the decision to a human if the
human is more likely than the model to take the correct action. We study the
calibration of L2D systems, investigating if the probabilities they output are
sound. We find that Mozannar & Sontag's (2020) multiclass framework is not
calibrated with respect to expert correctness. Moreover, it is not even
guaranteed to produce valid probabilities due to its parameterization being
degenerate for this purpose. We propose an L2D system based on one-vs-all
classifiers that is able to produce calibrated probabilities of expert
correctness. Furthermore, our loss function is also a consistent surrogate for
multiclass L2D, like Mozannar & Sontag's (2020). Our experiments verify that
not only is our system calibrated, but this benefit comes at no cost to
accuracy. Our model's accuracy is always comparable (and often superior) to
Mozannar & Sontag's (2020) model's in tasks ranging from hate speech detection
to galaxy classification to diagnosis of skin lesions.Comment: Accepted at the International Conference on Machine Learning (ICML),
202
Exploiting Inferential Structure in Neural Processes
Neural Processes (NPs) are appealing due to their ability to perform fast adaptation based on a context set. This set is encoded by a latent variable, which is often assumed to follow a simple distribution. However, in real-word settings, the context set may be drawn from richer distributions having multiple modes, heavy tails, etc. In this work, we provide a framework that allows NPs’ latent variable to be given a rich prior defined by a graphical model. These distributional assumptions directly translate into an appropriate aggregation strategy for the context set. Moreover, we describe a message-passing procedure that still allows for end-to-end optimization with stochastic gradients. We demonstrate the generality of our framework by using mixture and Student-t assumptions that yield improvements in function modelling and test-time robustness
Dropout as a structured shrinkage prior
Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout's Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior automatic depth determination as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks