31 research outputs found
Polya-gamma augmentations for factor models
Jufo_ID 71804.Bayesian inference for latent factor models, such as principal component and canonical correlation analysis, is easy for Gaussian likelihoods with conjugate priors using both Gibbs sampling and mean-field variational approximation. For other likelihood potentials one needs to either resort to more complex sampling schemes or to specifying dedicated forms for variational lower bounds. Recently, however, it was shown that for specific likelihoods related to the logistic function it is possible to augment the joint density with auxiliary variables following a P`olya-Gamma distribution, leading to closed-form updates for binary and over-dispersed count models. In this paper we describe how Gibbs sampling and mean-field variational approximation for various latent factor models can be implemented for these cases, presenting easy-to-implement and efficient inference schemas.Peer reviewe
Probabilistic Tensor Decomposition of Neural Population Spiking Activity
The firing of neural populations is coordinated across cells, in time, and across experimental conditions or repeated experimental trials, and so a full understanding of the computational significance of neural responses must be based on a separation of these different contributions to structured activity. Tensor decomposition is an approach to untangling the influence of multiple factors in data that is common in many fields. However, despite some recent interest in neuroscience, wider applicability of the approach is hampered by the lack of a full probabilistic treatment allowing principled inference of a decomposition from non-Gaussian spike-count data. Here, we extend the Polya-Gamma (PG) augmentation, previously used in sampling-based Bayesian inference, to implement scalable variational inference in non-conjugate spike-count models. Using this new approach, we develop techniques related to automatic relevance determination to infer the most appropriate tensor rank, as well as to incorporate priors based on known brain anatomy such as the segregation of cell response properties by brain area. We apply the model to neural recordings taken under conditions of visual-vestibular sensory integration, revealing how the encoding of self- and visual-motion signals is modulated by the sensory information available to the animal
Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes
We present an approximate Bayesian inference approach for estimating the intensity of a inhomogeneous Poisson process, where the intensity function is modelled using a Gaussian process (GP) prior via a sigmoid link function. Augmenting the model using a latent marked Poisson process and Polya--Gamma random variables we obtain a representation of the likelihood which is conjugate to the GP prior. We estimate the posterior using a variational free--form mean field optimisation together with the framework of sparse GPs. Furthermore, as alternative approximation we suggest a sparse Laplace's method for the posterior, for which an efficient expectation--maximisation algorithm is derived to find the posterior's mode. Both algorithms compare well against exact inference obtained by a Markov Chain Monte Carlo sampler and standard variational Gauss approach solving the same model, while being one order of magnitude faster. Furthermore, the performance and speed of our method is competitive with that of another recently proposed Poisson process model based on a quadratic link function, while not being limited to GPs with squared exponential kernels and rectangular domains.DFG, 318763901, Approximative Bayes’sche Schätzung und Modellauswahl für stochastische Differentialgleichungen (A06)DFG, 318763901, SFB 1294: Datenassimilation: Die nahtlose Verschmelzung von Daten und Modelle
Scalable Bayesian Induction of Word Embeddings
Traditional natural language processing has been shown to have excessive reliance on human-annotated corpora. However, the recent successes of machine translation and speech recognition, ascribed to the effective use of the increasingly availability of web-scale data in the wild, has given momentum to a re-surging interest in attempting to model natural language with simple statistical models, such as the n-gram model, that are easily scaled. Indeed, words and word combinations provide all the representational machinery one needs for solving many natural language tasks.
The degree of semantic similarity between two words is a function of the similarity of the linguistic contexts in which they appear. Word representations are mathematical objects, often vectors, that capture syntactic and semantic properties of a word. This results in words that are semantic cognates having similar word representations, an important property that we will widely use. We claim that word representations provide a superb framework for unsupervised learning on unlabelled data by compactly representing the distributional properties of words.
The current state-of-the-art word representation adopts the skip-gram model to train shallow neural networks and presents negative sampling, an idea borrowed from Noise Contrastive Estimation, as an efficient method of inducing embeddings. An alternative approach contends that the inherent multi-contextual nature of words entails a more Canonical Correlation Analysis-like approach for best results. In this thesis we develop the first fully Bayesian model to induce word embeddings.
The prominent contributions of this thesis are:
1. A crystallisation of the best practices from previous literature on word embeddings and matrix factorisation into a single hierarchical Bayesian model.
2. A scalable matrix factorisation technique for structured sparse data.
3. Representation of the latent dimensions as continuous Gaussian densities instead of as point estimates.
We analyse a corpus of 170 million tokens and learn for each word form a vectorial representation based on the 8 surrounding context words with a negative sampling rate of 2 per token. We would like to stress that while we certainly hope to beat the state-of-the-art, our primary goal is to develop a stochastic and scalable Bayesian model. We evaluate the quality of the word embeddings against the word analogy tasks as well as other such tasks as word similarity and chunking. We demonstrate competitive performance on standard benchmarks
Accentuation du phénotype des souris YG8sR, un modèle de l'ataxie de Friedreich, à l'aide de shARNs ciblant le gène de la frataxine
L'ataxie de Friedreich (FRDA est la plus fréquente ataxie neurodégénérative invalidante. Elle est une maladie héréditaire récessive progressive qui touche sévèrement le système nerveux et cardiaque. La FRDA pose non seulement un défis de thérapie curative mais aussi celui du modèle animal reproduisant la symptomatologie. Dépendant du nombre de répétitions GAA, les souris modèles, telles que les souris YG8sR contenant entre 250-300 GAA, présentent un phénotype plus ou moins sévère. Notre étude a pour but d'accentuer le phénotype des souris YG8sR en utilisant des short hairpin ARNs (shARNs) ciblant l'ARNm de la frataxine pour réduire l'expression de cette protéine. Nous avons pu, après un test d'efficacité des shARNs in vitro dans les cellules HeLa et HEK 293T, choisir 2 shARNs parmi les 4 testés capables de réduire le taux de frataxine. Nous avons sélectionné les shARN6 et shARN1 qui étaient capables après la transfection dans les cellules à 2 µg d'ADN de réduire respectivement de 40% et 70% le taux de frataxine dans les cellules. Lorsque nous avons injecté en intraveineuse 1.2X10¹² ou 2.4X10¹² copies d'AAV-PHP.B codant pour ces shARNs, nous avons observé une perte de poids, des troubles de la motricité et de la coordination, ainsi qu'une diminution de la force motrice chez les souris YG8sR ayant reçu du shARN1 à 1.2X10¹². Nous avons donc développé un modèle amélioré de souris (Imp-YG8sR) en réduisant davantage l'expression de la frataxine avec cette dose de shARN1. Le phénotype plus sévère de ces souris est plus proche de celui des patients atteints de l'ataxie de Friedreich que le modèle original YG8sR utilisé sans les shARNs. Notre modèle de souris Imp-YG8sR sera donc bénéfique pour des tests de thérapies géniques actuellement en développement.Friedreich's ataxia (FRDA) is the most common disabling neurodegenerative ataxia. It is a progressive recessive inherited disease that severely affects the nervous and cardiac systems. FRDA poses not only a challenge of curative therapy but also that of the animal model reproducing the symptomatology. Depending on the number of GAA repeats, mouse models, such as YG8sR containing between 250-300 GAA, exhibit a more or less severe phenotype. Our study aims to enhance the phenotype of YG8sR mice by using short hairpin RNAs (shRNAs) targeting the frataxin mRNA to reduce the expression of this protein. We were able, after an in vitro efficacy test of the shRNAs in HeLa and HEK 293T cells, to choose 2 shRNAs among the 4 tested, to reduce the frataxin level. We selected shARN6 and shARN1 which were able to reduce the frataxin level by up to 40% and 70%, respectively in cells, when injected intravenously at 1.2X10¹² or 2.4X10¹² copies of AAV-PHP.B encoding these shRNAs. We observed a loss of weight, a disturbance of the motor skills, a reduced coordination and force in the YG8sR mice, which received the shRNA1 at 1.2X10¹². We have therefore developed an improved mouse model (Imp-YG8sR) by further reducing the expression of frataxin with this dose of shRNA1. The more severe phenotype of the Imp-YG8sR is closer to that of patients with Friedreich's ataxia than the original model used without the shRNAs. Our Imp-YG8sR mouse model will therefore be beneficial for gene therapies currently in development
Recommended from our members
Allocative Poisson Factorization for Computational Social Science
Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable whose rate parameter is a function of shared model parameters. This thesis examines a specific subset of Poisson factorization models that constrain to be a multilinear function of shared model parameters. This subset of models---hereby referred to as allocative Poisson factorization (APF)---enjoys a significant computational advantage: posterior inference scales linearly with only the number of non-zero counts in the data set. A challenge to constructing and performing inference in APF models is that the multilinear constraint on ---which must be non-negative, by the definition of the Poisson distribution---means that the shared model parameters must themselves be non-negative. Constructing models that capture the complex dependency structures inherent to social processes---e.g., networks with overlapping communities of actors or bursty temporal dynamics---without relying on the analytic convenience and tractability of the Gaussian distribution requires novel constructions of non-negative distributions---e.g., gamma and Dirichlet---and innovative posterior inference techniques. This thesis presents the APF analogue to several widely-used models---i.e., CP decomposition (Chapter 3), Tucker decomposition (Chapter 4), and linear dynamical systems (Chapters 5 and 6) and shows how to perform Bayesian inference in APF models under local differential privacy (Chapter 7). Most of these chapters introduce novel auxiliary-variable augmentation schemes to facilitate posterior inference using both Markov chain Monte Carlo and variational inference algorithms. While the task of modeling international relations event data is a recurrent theme, the models presented are applicable to a wide range of tasks in many fields
The Bayesian Learning Rule
We show that many machine-learning algorithms are specific instances of a
single algorithm called the Bayesian learning rule. The rule, derived from
Bayesian principles, yields a wide-range of algorithms from fields such as
optimization, deep learning, and graphical models. This includes classical
algorithms such as ridge regression, Newton's method, and Kalman filter, as
well as modern deep-learning algorithms such as stochastic-gradient descent,
RMSprop, and Dropout. The key idea in deriving such algorithms is to
approximate the posterior using candidate distributions estimated by using
natural gradients. Different candidate distributions result in different
algorithms and further approximations to natural gradients give rise to
variants of those algorithms. Our work not only unifies, generalizes, and
improves existing algorithms, but also helps us design new ones