9 research outputs found
Fast matrix computations for functional additive models
It is common in functional data analysis to look at a set of related
functions: a set of learning curves, a set of brain signals, a set of spatial
maps, etc. One way to express relatedness is through an additive model, whereby
each individual function is assumed to be a variation
around some shared mean . Gaussian processes provide an elegant way of
constructing such additive models, but suffer from computational difficulties
arising from the matrix operations that need to be performed. Recently Heersink
& Furrer have shown that functional additive model give rise to covariance
matrices that have a specific form they called quasi-Kronecker (QK), whose
inverses are relatively tractable. We show that under additional assumptions
the two-level additive model leads to a class of matrices we call restricted
quasi-Kronecker, which enjoy many interesting properties. In particular, we
formulate matrix factorisations whose complexity scales only linearly in the
number of functions in latent field, an enormous improvement over the cubic
scaling of na\"ive approaches. We describe how to leverage the properties of
rQK matrices for inference in Latent Gaussian Models
Using auditory classification images for the identification of fine acoustic cues used in speech perception
International audienceAn essential step in understanding the processes underlying the general mechanism of perceptual categorization is to identify which portions of a physical stimulation modulate the behavior of our perceptual system. More specifically, in the context of speech comprehension, it is still a major open challenge to understand which information is used to categorize a speech stimulus as one phoneme or another, the auditory primitives relevant for the categorical perception of speech being still unknown. Here we propose to adapt a method relying on a Generalized Linear Model with smoothness priors, already used in the visual domain for the estimation of so-called classification images, to auditory experiments. This statistical model offers a rigorous framework for dealing with non-Gaussian noise, as it is often the case in the auditory modality, and limits the amount of noise in the estimated template by enforcing smoother solutions. By applying this technique to a specific two-alternative forced choice experiment between stimuli " aba " and " ada " in noise with an adaptive SNR, we confirm that the second formantic transition is key for classifying phonemes into /b/ or /d/ in noise, and that its estimation by the auditory system is a relative measurement across spectral bands and in relation to the perceived height of the second formant in the preceding syllable. Through this example, we show how the GLM with smoothness priors approach can be applied to the identification of fine functional acoustic cues in speech perception. Finally we discuss some assumptions of the model in the specific case of speech perception
Recommended from our members
Efficient Deterministic Approximate Bayesian Inference for Gaussian Process models
Gaussian processes are powerful nonparametric distributions over continuous functions that
have become a standard tool in modern probabilistic machine learning. However, the
applicability of Gaussian processes in the large-data regime and in hierarchical probabilistic
models is severely limited by analytic and computational intractabilities. It is, therefore,
important to develop practical approximate inference and learning algorithms that can
address these challenges. To this end, this dissertation provides a comprehensive and unifying
perspective of pseudo-point based deterministic approximate Bayesian learning for a wide
variety of Gaussian process models, which connects previously disparate literature, greatly
extends them and allows new state-of-the-art approximations to emerge.
We start by building a posterior approximation framework based on Power-Expectation
Propagation for Gaussian process regression and classification. This framework relies on a
structured approximate Gaussian process posterior based on a small number of pseudo-points,
which is judiciously chosen to summarise the actual data and enable tractable and efficient
inference and hyperparameter learning. Many existing sparse approximations are recovered
as special cases of this framework, and can now be understood as performing approximate
posterior inference using a common approximate posterior. Critically, extensive empirical
evidence suggests that new approximation methods arisen from this unifying perspective
outperform existing approaches in many real-world regression and classification tasks.
We explore the extensions of this framework to Gaussian process state space models,
Gaussian process latent variable models and deep Gaussian processes, which also unify
many recently developed approximation schemes for these models. Several mean-field and
structured approximate posterior families for the hidden variables in these models are studied.
We also discuss several methods for approximate uncertainty propagation in recurrent and
deep architectures based on Gaussian projection, linearisation, and simple Monte Carlo. The
benefit of the unified inference and learning frameworks for these models are illustrated in a
variety of real-world state-space modelling and regression tasks
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Advances in scalable learning and sampling of unnormalised models
We study probabilistic models that are known incompletely, up to an intractable normalising constant. To reap the full benefit of such models, two
tasks must be solved: learning and sampling. These two tasks have been
subject to decades of research, and yet significant challenges still persist.
Traditional approaches often suffer from poor scalability with respect to
dimensionality and model-complexity, generally rendering them inapplicable to models parameterised by deep neural networks. In this thesis, we
contribute a new set of methods for addressing this scalability problem.
We first explore the problem of learning unnormalised models. Our investigation begins with a well-known learning principle, Noise-contrastive
Estimation, whose underlying mechanism is that of density-ratio estimation.
By examining why existing density-ratio estimators scale poorly, we identify a new framework, telescoping density-ratio estimation (TRE), that can
learn ratios between highly dissimilar densities in high-dimensional spaces.
Our experiments demonstrate that TRE not only yields substantial improvements for the learning of deep unnormalised models, but can do the
same for a broader set of tasks including mutual information estimation and
representation learning.
Subsequently, we explore the problem of sampling unnormalised models.
A large literature on Markov chain Monte Carlo (MCMC) can be leveraged here, and in continuous domains, gradient-based samplers such as
Metropolis-adjusted Langevin algorithm (MALA) and Hamiltonian Monte
Carlo are excellent options. However, there has been substantially less
progress in MCMC for discrete domains. To advance this subfield, we introduce several discrete Metropolis-Hastings samplers that are conceptually
inspired by MALA, and demonstrate their strong empirical performance
across a range of challenging sampling tasks
Scalable Algorithms for the Analysis of Massive Networks
Die Netzwerkanalyse zielt darauf ab, nicht-triviale Erkenntnisse aus vernetzten Daten zu gewinnen. Beispiele für diese Erkenntnisse sind die Wichtigkeit einer Entität im Verhältnis zu anderen nach bestimmten Kriterien oder das Finden des am besten geeigneten Partners für jeden Teilnehmer eines Netzwerks - bekannt als Maximum Weighted Matching (MWM).
Da der Begriff der Wichtigkeit an die zu betrachtende Anwendung gebunden ist, wurden zahlreiche Zentralitätsmaße eingeführt. Diese Maße stammen hierbei aus Jahrzehnten, in denen die Rechenleistung sehr begrenzt war und die Netzwerke im Vergleich zu heute viel kleiner waren. Heute sind massive Netzwerke mit Millionen von Kanten allgegenwärtig und eine triviale Berechnung von Zentralitätsmaßen ist oft zu zeitaufwändig. Darüber hinaus ist die Suche nach der Gruppe von k Knoten mit hoher Zentralität eine noch kostspieligere Aufgabe. Skalierbare Algorithmen zur Identifizierung hochzentraler (Gruppen von) Knoten in großen Graphen sind von großer Bedeutung für eine umfassende Netzwerkanalyse.
Heutigen Netzwerke verändern sich zusätzlich im zeitlichen Verlauf und die effiziente Aktualisierung der Ergebnisse nach einer Änderung ist eine Herausforderung. Effiziente dynamische Algorithmen sind daher ein weiterer wesentlicher Bestandteil moderner Analyse-Pipelines.
Hauptziel dieser Arbeit ist es, skalierbare algorithmische Lösungen für die zwei oben genannten Probleme zu finden. Die meisten unserer Algorithmen benötigen Sekunden bis einige Minuten, um diese Aufgaben in realen Netzwerken mit bis zu Hunderten Millionen von Kanten zu lösen, was eine deutliche Verbesserung gegenüber dem Stand der Technik darstellt. Außerdem erweitern wir einen modernen Algorithmus für MWM auf dynamische Graphen. Experimente zeigen, dass unser dynamischer MWM-Algorithmus Aktualisierungen in Graphen mit Milliarden von Kanten in Millisekunden bewältigt.Network analysis aims to unveil non-trivial insights from networked data by studying relationship patterns between the entities of a network. Among these insights, a popular one is to quantify the importance of an entity with respect to the others according to some criteria. Another one is to find the most suitable matching partner for each participant of a network knowing the pairwise preferences of the participants to be matched with each other - known as Maximum Weighted Matching (MWM).
Since the notion of importance is tied to the application under consideration, numerous centrality measures have been introduced. Many of these measures, however, were conceived in a time when computing power was very limited and networks were much smaller compared to today's, and thus scalability to large datasets was not considered. Today, massive networks with millions of edges are ubiquitous, and a complete exact computation for traditional centrality measures are often too time-consuming. This issue is amplified if our objective is to find the group of k vertices that is the most central as a group. Scalable algorithms to identify highly central (groups of) vertices on massive graphs are thus of pivotal importance for large-scale network analysis.
In addition to their size, today's networks often evolve over time, which poses the challenge of efficiently updating results after a change occurs. Hence, efficient dynamic algorithms are essential for modern network analysis pipelines.
In this work, we propose scalable algorithms for identifying important vertices in a network, and for efficiently updating them in evolving networks. In real-world graphs with hundreds of millions of edges, most of our algorithms require seconds to a few minutes to perform these tasks. Further, we extend a state-of-the-art algorithm for MWM to dynamic graphs. Experiments show that our dynamic MWM algorithm handles updates in graphs with billion edges in milliseconds
In silico modelling of parasite dynamics
Understanding host-parasite systems are challenging if biologists employ just the experimental approaches adopted, whereas mathematical models can help uncover other
in-depth knowledge about host infection dynamics. Previous experimental studies have
explored the infrapopulation dynamics of Gyrodactylus turnbulli and G. bullatarudis ectoparasites on their fish host, Poecilia reticulata. However, other important and open
biological questions exist concerning parasite microhabitat preference, host survival, parasite virulence, and the transmission dynamics of different Gyrodactylus strains across different host populations over time. This thesis mathematically investigates these relevant
biological questions to understand the gyrodactylid-fish system’s complexity better using
a sophisticated multi-state Markov model (MSM) and a novel individual-based stochastic simulation model. The infection dynamics of three different gyrodactylid strains are
compared across three different host populations. A modified approximate Bayesian computation (ABC) with sequential Monte Carlo (SMC) and sequential importance sampling
(SIS) is developed for calibrating the novel stochastic model based on existing empirical
data and an auxiliary stochastic model. In addition, an extended local-linear regression
(with L2 regularisation) for ABC post-processing analysis has been proposed. Advanced
statistics and an MSM are used to assess spatial-temporal parasite dynamics. A linear
birth-death process with catastrophic extinction (B-D-C process) is considered the auxiliary model for the complex simulation model to refine the modified ABC’s summary
statistics, with other theoretical justifications and parameter estimation techniques of
the B-D-C process provided. The B-D-C process simulation using τ -leaping also provides additional insights on accelerating the complex simulation model by proposing a
reasonable error threshold based on the trade-off between simulation accuracy and computational speed. The mathematical models can be extended and adapted for other
host-parasite systems, and the modified ABC methodologies can also aid in efficiently
calibrating other multi-parameter models with a high-dimensional set of correlating or
independent summary statistics
Auxiliary variable Markov chain Monte Carlo methods
Markov chain Monte Carlo (MCMC) methods are a widely applicable
class of algorithms for estimating integrals in statistical inference problems.
A common approach in MCMC methods is to introduce additional
auxiliary variables into the Markov chain state and perform transitions
in the joint space of target and auxiliary variables. In this thesis we consider
novel methods for using auxiliary variables within MCMC methods
to allow approximate inference in otherwise intractable models and to
improve sampling performance in models exhibiting challenging properties
such as multimodality.
We first consider the pseudo-marginal framework. This extends the
Metropolis–Hastings algorithm to cases where we only have access to
an unbiased estimator of the density of target distribution. The resulting
chains can sometimes show ‘sticking’ behaviour where long series
of proposed updates are rejected. Further the algorithms can be difficult
to tune and it is not immediately clear how to generalise the approach
to alternative transition operators. We show that if the auxiliary variables
used in the density estimator are included in the chain state it is
possible to use new transition operators such as those based on slice-sampling
algorithms within a pseudo-marginal setting. This auxiliary
pseudo-marginal approach leads to easier to tune methods and is often
able to improve sampling efficiency over existing approaches.
As a second contribution we consider inference in probabilistic models
defined via a generative process with the probability density of the outputs
of this process only implicitly defined. The approximate Bayesian
computation (ABC) framework allows inference in such models when
conditioning on the values of observed model variables by making the
approximation that generated observed variables are ‘close’ rather than
exactly equal to observed data. Although making the inference problem
more tractable, the approximation error introduced in ABC methods can
be difficult to quantify and standard algorithms tend to perform poorly
when conditioning on high dimensional observations. This often requires
further approximation by reducing the observations to lower
dimensional summary statistics.
We show how including all of the random variables used in generating
model outputs as auxiliary variables in a Markov chain state can
allow the use of more efficient and robust MCMC methods such as slice
sampling and Hamiltonian Monte Carlo (HMC) within an ABC framework.
In some cases this can allow inference when conditioning on
the full set of observed values when standard ABC methods require reduction
to lower dimensional summaries for tractability. Further we
introduce a novel constrained HMC method for performing inference
in a restricted class of differentiable generative models which allows
conditioning the generated observed variables to be arbitrarily close to
observed data while maintaining computational tractability.
As a final topicwe consider the use of an auxiliary temperature variable
in MCMC methods to improve exploration of multimodal target densities
and allow estimation of normalising constants. Existing approaches
such as simulated tempering and annealed importance sampling use
temperature variables which take on only a discrete set of values. The
performance of these methods can be sensitive to the number and spacing
of the temperature values used, and the discrete nature of the temperature
variable prevents the use of gradient-based methods such as
HMC to update the temperature alongside the target variables. We introduce
new MCMC methods which instead use a continuous temperature
variable. This both removes the need to tune the choice of discrete
temperature values and allows the temperature variable to be updated
jointly with the target variables within a HMC method