296 research outputs found
Exact Bayesian curve fitting and signal segmentation.
We consider regression models where the underlying functional relationship between the response and the explanatory variable is modeled as independent linear regressions on disjoint segments. We present an algorithm for perfect simulation from the posterior distribution of such a model, even allowing for an unknown number of segments and an unknown model order for the linear regressions within each segment. The algorithm is simple, can scale well to large data sets, and avoids the problem of diagnosing convergence that is present with Monte Carlo Markov Chain (MCMC) approaches to this problem. We demonstrate our algorithm on standard denoising problems, on a piecewise constant AR model, and on a speech segmentation problem
Simulator adaptation at runtime for component-based simulation software
Component-based simulation software can provide many opportunities to compose and configure simulators, resulting in an algorithm selection problem for the user of this software. This thesis aims to automate the selection and adaptation of simulators at runtime in an application-independent manner. Further, it explores the potential of tailored and approximate simulators - in this thesis concretely developed for the modeling language ML-Rules - supporting the effectiveness of the adaptation scheme.Komponenten-basierte Simulationssoftware kann viele Möglichkeiten zur Komposition und Konfiguration von Simulatoren bieten und damit zu einem Konfigurationsproblem für Nutzer dieser Software führen. Das Ziel dieser Arbeit ist die Entwicklung einer generischen und automatisierten Auswahl- und Adaptionsmethode für Simulatoren. Darüber hinaus wird das Potential von spezifischen und approximativen Simulatoren anhand der Modellierungssprache ML-Rules untersucht, welche die Effektivität des entwickelten Adaptionsmechanismus erhöhen können
Sequential Adaptive Detection for In-Situ Transmission Electron Microscopy (TEM)
We develop new efficient online algorithms for detecting transient sparse
signals in TEM video sequences, by adopting the recently developed framework
for sequential detection jointly with online convex optimization [1]. We cast
the problem as detecting an unknown sparse mean shift of Gaussian observations,
and develop adaptive CUSUM and adaptive SSRS procedures, which are based on
likelihood ratio statistics with post-change mean vector being online maximum
likelihood estimators with . We demonstrate the meritorious performance
of our algorithms for TEM imaging using real data
Recommended from our members
Monte Carlo Methods in Practice and Efficiency Enhancements via Parallel Computation
Monte Carlo methods are crucial when dealing with advanced problems in Bayesian inference. Indeed, common approaches such as Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) can be endlessly adapted to tackle the most complex problems. What is important then is to construct efficient algorithms, and significant attention in the literature is devoted to developing algorithms that mix well, have low computational complexity and can scale up to large datasets. One of the most commonly used and straightforward approaches is to speed up Monte Carlo algorithms by running them in parallel computing environments. The compute time of Monte Carlo algorithms is random and can vary depending on the current state of the Markov chain. Other computing-infrastructure related factors, such as competing jobs on the same processor, or memory bandwidth, which are prevalent in shared computing architectures such as cloud computing, can also affect this compute time. However, many algorithms running in parallel require the processors to communicate every so often, and for that we must ensure that they are simultaneously ready and any idle wait time is minimised. This can be done by employing a framework known as Anytime Monte Carlo, which imposes a real-time deadline on parallel computations.
The contributions in this thesis include novel applications of the Anytime framework to construct efficient Anytime MCMC and SMC algorithms which make use of parallel computing in order to perform inference for advanced problems. Examples of such problems investigated include models in which the likelihood cannot be evaluated analytically, and changepoint models, which are often used to model the heterogeneity of sequential data, but tricky to infer upon due to the unknown number and locations of the changepoints. This thesis also focuses on the difficult task of performing parameter inference in single-molecule microscopy, a category of models in which the arrival rate of observations is not uniformly distributed and measurement models have complex forms. These issues are exacerbated when molecules have trajectories described by stochastic differential equations.
The original contributions of this thesis are organised in Chapters 4-6. Chapter 4 shows the development of a novel Anytime parallel tempering algorithm and demonstrates the performance enhancements the Anytime framework brings to parallel tempering, an algorithm, which runs multiple interacting MCMC chains in order to more efficiently explore the state space. In Chapter 5, a general Anytime SMC sampler is developed for performing changepoint inference using reversible jump MCMC (RJ-MCMC), an algorithm that takes into account the unknown number of changepoints by including transdimensional MCMC updates. The workings of the algorithm are illustrated on a particularly complex changepoint model, and once again the improvements in performance brought by employing the Anytime framework are demonstrated. Chapter 6 moves away from the Anytime framework, and presents a novel and general SMC approach to performing parameter inference for molecules with stochastic trajectories
Contributions on evolutionary computation for statistical inference
Evolutionary Computation (EC) techniques have been introduced in the 1960s for dealing with complex situations. One possible example is an optimization problems not having an analytical solution or being computationally intractable; in many cases such methods, named Evolutionary Algorithms (EAs), have been successfully implemented. In statistics there are many situations where complex problems arise, in particular concerning optimization. A general example is when the statistician needs to select, inside a prohibitively large discrete set, just one element, which could be a model, a partition, an experiment, or such: this would be the case of model selection, cluster analysis or design of experiment. In other situations there could be an intractable function of data, such as a likelihood, which needs to be maximized, as it happens in model parameter estimation. These kind of problems are naturally well suited for EAs, and in the last 20 years a large number of papers has been concerned with applications of EAs in tackling statistical issues.
The present dissertation is set in this part of literature, as it reports several implementations of EAs in statistics, although being mainly focused on statistical inference problems. Original results are proposed, as well as overviews and surveys on several topics. EAs are employed and analyzed considering various statistical points of view, showing and confirming their efficiency and flexibility.
The first proposal is devoted to parametric estimation problems. When EAs are employed in such analysis a novel form of variability related to their stochastic elements is introduced. We shall analyze both variability due to sampling, associated with selected estimator, and variability due to the EA. This analysis is set in a framework of statistical and computational tradeoff question, crucial in nowadays problems, by introducing cost functions related to both data acquisition and EA iterations. The proposed method will be illustrated by means of model building problem examples.
Subsequent chapter is concerned with EAs employed in Markov Chain Monte Carlo (MCMC) sampling. When sampling from multimodal or highly correlated distribution is concerned, in fact, a possible strategy suggests to run several chains in parallel, in order to improve their mixing. If these chains are allowed to interact with each other then many analogies with EC techniques can be observed, and this has led to research in many fields. The chapter aims at reviewing various methods found in literature which conjugates EC techniques and MCMC sampling, in order to identify specific and common procedures, and unifying them in a framework of EC.
In the last proposal we present a complex time series model and an identification procedure based on Genetic Algorithms (GAs). The model is capable of dealing with seasonality, by Periodic AutoRegressive (PAR) modelling, and structural changes in time, leading to a nonstationary structure. As far as a very large number of parameters and possibilites of change points are concerned, GAs are appropriate for identifying such model. Effectiveness of procedure is shown on both simulated data and real examples, these latter referred to river flow data in hydrology.
The thesis concludes with some final remarks, concerning also future work
Contributions on evolutionary computation for statistical inference
Evolutionary Computation (EC) techniques have been introduced in the 1960s for dealing with complex situations. One possible example is an optimization problems not having an analytical solution or being computationally intractable; in many cases such methods, named Evolutionary Algorithms (EAs), have been successfully implemented. In statistics there are many situations where complex problems arise, in particular concerning optimization. A general example is when the statistician needs to select, inside a prohibitively large discrete set, just one element, which could be a model, a partition, an experiment, or such: this would be the case of model selection, cluster analysis or design of experiment. In other situations there could be an intractable function of data, such as a likelihood, which needs to be maximized, as it happens in model parameter estimation. These kind of problems are naturally well suited for EAs, and in the last 20 years a large number of papers has been concerned with applications of EAs in tackling statistical issues.
The present dissertation is set in this part of literature, as it reports several implementations of EAs in statistics, although being mainly focused on statistical inference problems. Original results are proposed, as well as overviews and surveys on several topics. EAs are employed and analyzed considering various statistical points of view, showing and confirming their efficiency and flexibility.
The first proposal is devoted to parametric estimation problems. When EAs are employed in such analysis a novel form of variability related to their stochastic elements is introduced. We shall analyze both variability due to sampling, associated with selected estimator, and variability due to the EA. This analysis is set in a framework of statistical and computational tradeoff question, crucial in nowadays problems, by introducing cost functions related to both data acquisition and EA iterations. The proposed method will be illustrated by means of model building problem examples.
Subsequent chapter is concerned with EAs employed in Markov Chain Monte Carlo (MCMC) sampling. When sampling from multimodal or highly correlated distribution is concerned, in fact, a possible strategy suggests to run several chains in parallel, in order to improve their mixing. If these chains are allowed to interact with each other then many analogies with EC techniques can be observed, and this has led to research in many fields. The chapter aims at reviewing various methods found in literature which conjugates EC techniques and MCMC sampling, in order to identify specific and common procedures, and unifying them in a framework of EC.
In the last proposal we present a complex time series model and an identification procedure based on Genetic Algorithms (GAs). The model is capable of dealing with seasonality, by Periodic AutoRegressive (PAR) modelling, and structural changes in time, leading to a nonstationary structure. As far as a very large number of parameters and possibilites of change points are concerned, GAs are appropriate for identifying such model. Effectiveness of procedure is shown on both simulated data and real examples, these latter referred to river flow data in hydrology.
The thesis concludes with some final remarks, concerning also future work
Segmentation of the Poisson and negative binomial rate models: a penalized estimator
We consider the segmentation problem of Poisson and negative binomial (i.e.
overdispersed Poisson) rate distributions. In segmentation, an important issue
remains the choice of the number of segments. To this end, we propose a
penalized log-likelihood estimator where the penalty function is constructed in
a non-asymptotic context following the works of L. Birg\'e and P. Massart. The
resulting estimator is proved to satisfy an oracle inequality. The performances
of our criterion is assessed using simulated and real datasets in the RNA-seq
data analysis context
Changepoint detection for data intensive settings
Detecting a point in a data sequence where the behaviour alters abruptly, otherwise known as a changepoint, has been an active area of interest for decades. More recently, with the advent of the data intensive era, the need for automated and computationally efficient changepoint methods has grown. We here introduce several new techniques for doing this which address many of the issues inherent in detecting changes in a streaming setting. In short, these new methods, which may be viewed as non-trivial extensions of existing classical procedures, are intended to be as useful in as wide a set of situations as possible, while retaining important theoretical guarantees and ease of implementation. The first novel contribution concerns two methods for parallelising existing dynamic programming based approaches to changepoint detection in the single variate setting. We demonstrate that these methods can result in near quadratic computational gains, while retaining important theoretical guarantees. Our next area of focus is the multivariate setting. We introduce two new methods for data intensive scenarios with a fixed, but possibly large, number of dimensions. The first of these is an offline method which detects one change at a time using a new test statistic. We demonstrate that this test statistic has competitive power in a variety of possible settings for a given changepoint, while allowing the method to be versatile across a range of possible modelling assumptions. The other method we introduce for multivariate data is also suitable in the streaming setting. In addition, it is able to relax many standard modelling assumptions. We discuss the empirical properties of the procedure, especially insofar as they relate to a desired false alarm error rate
- …