6 research outputs found
Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions
To minimize the average of a set of log-convex functions, the stochastic
Newton method iteratively updates its estimate using subsampled versions of the
full objective's gradient and Hessian. We contextualize this optimization
problem as sequential Bayesian inference on a latent state-space model with a
discriminatively-specified observation process. Applying Bayesian filtering
then yields a novel optimization algorithm that considers the entire history of
gradients and Hessians when forming an update. We establish matrix-based
conditions under which the effect of older observations diminishes over time,
in a manner analogous to Polyak's heavy ball momentum. We illustrate various
aspects of our approach with an example and review other relevant innovations
for the stochastic Newton method
Fast Adaptively Weighted Matrix Factorization for Recommendation with Implicit Feedback
Recommendation from implicit feedback is a highly challenging task due to the
lack of the reliable observed negative data. A popular and effective approach
for implicit recommendation is to treat unobserved data as negative but
downweight their confidence. Naturally, how to assign confidence weights and
how to handle the large number of the unobserved data are two key problems for
implicit recommendation models. However, existing methods either pursuit fast
learning by manually assigning simple confidence weights, which lacks
flexibility and may create empirical bias in evaluating user's preference; or
adaptively infer personalized confidence weights but suffer from low
efficiency. To achieve both adaptive weights assignment and efficient model
learning, we propose a fast adaptively weighted matrix factorization (FAWMF)
based on variational auto-encoder. The personalized data confidence weights are
adaptively assigned with a parameterized neural network (function) and the
network can be inferred from the data. Further, to support fast and stable
learning of FAWMF, a new specific batch-based learning algorithm fBGD has been
developed, which trains on all feedback data but its complexity is linear to
the number of observed data. Extensive experiments on real-world datasets
demonstrate the superiority of the proposed FAWMF and its learning algorithm
fBGD
VB-MK-LMF: Fusion of drugs, targets and interactions using Variational Bayesian Multiple Kernel Logistic Matrix Factorization
Background
Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance.
Method
We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions.
Results
VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of ``small sample size'' regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time.
Conclusion
In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.
Availability
Data and code are available at http://bioinformatics.mit.bme.hu
Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices.
Fully observed large binary matrices appear in a wide variety of contexts. To model them, probabilistic matrix factorization (PMF) methods are an attractive solution. However, current batch algorithms for PMF can be inefficient because they need to analyze the entire data matrix before producing any parameter updates. We derive an efficient stochastic inference algorithm for PMF models of fully observed binary matrices. Our method exhibits faster convergence rates than more expensive batch approaches and has better predictive performance than scalable alternatives. The proposed method includes new data subsampling strategies which produce large gains over standard uniform subsampling. We also address the task of automatically selecting the size of the minibatches of data used by our method. For this, we derive an algorithm that adjusts this hyper-parameter online
Recommended from our members
Efficient Bayesian active learning and matrix modelling
With the advent of the Internet and growth of storage capabilities, large collections of unlabelled data are now available. However, collecting supervised labels can be costly. Active learning addresses this by selecting, sequentially, only the most useful data in light of the information collected so far. The online nature of such algorithms often necessitates efficient computations. Thus, we present a framework for information theoretic Bayesian active learning, named Bayesian Active Learning by Disagreement, that permits efficient and accurate computations of data utility. Using this framework we develop new techniques for active Gaussian process modelling and adaptive quantum tomography. The latter has been shown, in both simulation and laboratory experiments, to yield faster learning rates than any non-adaptive design.
Numerous datasets can be represented as matrices. Bayesian models of matrices are becoming increasingly popular because they can handle noisy or missing elements, and are extensible to different data-types. However, efficient inference is crucial to allow these flexible probabilistic models to scale to large real-world datasets. Binary matrices are a ubiquitous datatype, so we present a stochastic inference algorithm for fast learning in this domain. Preference judgements are a common, implicit source of binary data. We present a hybrid matrix factorization/Gaussian process model for collaborative learning from multiple users' preferences. This model exploits both the structure of the matrix and can incorporate additional covariate information to make accurate predictions.
We then combine matrix modelling with active learning and propose a new algorithm for cold-start learning with ordinal data, such as ratings. This algorithm couples Bayesian Active Learning by Disagreement with a heteroscedastic model to handle varying levels of noise. This ordinal matrix model is also used to analyze psychometric questionnaires; we analyze classical assumptions made in psychometrics and show that active learning methods can reduce questionnaire lengths substantially.This PhD was supported by the Google European Doctoral Fellowshi