3,285 research outputs found
Parallel hierarchical sampling:a general-purpose class of multiple-chains MCMC algorithms
This paper introduces the Parallel Hierarchical Sampler (PHS), a class of Markov chain Monte Carlo algorithms using several interacting chains having the same target distribution but different mixing properties. Unlike any single-chain MCMC algorithm, upon reaching stationarity one of the PHS chains, which we call the âmotherâ chain, attains exact Monte Carlo sampling of the target distribution of interest. We empirically show that this translates in a dramatic improvement in the samplerâs performance with respect to single-chain MCMC algorithms. Convergence of the PHS joint transition kernel is proved and its relationships with single-chain samplers, Parallel Tempering (PT) and variable augmentation algorithms are discussed. We then provide two illustrative examples comparing the accuracy of PHS with
Structural Drift: The Population Dynamics of Sequential Learning
We introduce a theory of sequential causal inference in which learners in a
chain estimate a structural model from their upstream teacher and then pass
samples from the model to their downstream student. It extends the population
dynamics of genetic drift, recasting Kimura's selectively neutral theory as a
special case of a generalized drift process using structured populations with
memory. We examine the diffusion and fixation properties of several drift
processes and propose applications to learning, inference, and evolution. We
also demonstrate how the organization of drift process space controls fidelity,
facilitates innovations, and leads to information loss in sequential learning
with and without memory.Comment: 15 pages, 9 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/sdrift.ht
Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies
In motion analysis and understanding it is important to be able to fit a
suitable model or structure to the temporal series of observed data, in order
to describe motion patterns in a compact way, and to discriminate between them.
In an unsupervised context, i.e., no prior model of the moving object(s) is
available, such a structure has to be learned from the data in a bottom-up
fashion. In recent times, volumetric approaches in which the motion is captured
from a number of cameras and a voxel-set representation of the body is built
from the camera views, have gained ground due to attractive features such as
inherent view-invariance and robustness to occlusions. Automatic, unsupervised
segmentation of moving bodies along entire sequences, in a temporally-coherent
and robust way, has the potential to provide a means of constructing a
bottom-up model of the moving body, and track motion cues that may be later
exploited for motion classification. Spectral methods such as locally linear
embedding (LLE) can be useful in this context, as they preserve "protrusions",
i.e., high-curvature regions of the 3D volume, of articulated shapes, while
improving their separation in a lower dimensional space, making them in this
way easier to cluster. In this paper we therefore propose a spectral approach
to unsupervised and temporally-coherent body-protrusion segmentation along time
sequences. Volumetric shapes are clustered in an embedding space, clusters are
propagated in time to ensure coherence, and merged or split to accommodate
changes in the body's topology. Experiments on both synthetic and real
sequences of dense voxel-set data are shown. This supports the ability of the
proposed method to cluster body-parts consistently over time in a totally
unsupervised fashion, its robustness to sampling density and shape quality, and
its potential for bottom-up model constructionComment: 31 pages, 26 figure
Recommended from our members
Composing Deep Learning and Bayesian Nonparametric Methods
Recent progress in Bayesian methods largely focus on non-conjugate models featured with extensive use of black-box functions: continuous functions implemented with neural networks. Using deep neural networks, Bayesian models can reasonably fit big data while at the same time capturing model uncertainty. This thesis targets at a more challenging problem: how do we model general random objects, including discrete ones, using random functions? Our conclusion is: many (discrete) random objects are in nature a composition of Poisson processes and random functions}. Thus, all discreteness is handled through the Poisson process while random functions captures the rest complexities of the object. Thus the title: composing deep learning and Bayesian nonparametric methods.
This conclusion is not a conjecture. In spacial cases such as latent feature models , we can prove this claim by working on infinite dimensional spaces, and that is how Bayesian nonparametric kicks in. Moreover, we will assume some regularity assumptions on random objects such as exchangeability. Then the representations will show up magically using representation theorems. We will see this two times throughout this thesis.
One may ask: when a random object is too simple, such as a non-negative random vector in the case of latent feature models, how can we exploit exchangeability? The answer is to aggregate infinite random objects and map them altogether onto an infinite dimensional space. And then assume exchangeability on the infinite dimensional space. We demonstrate two examples of latent feature models by (1) concatenating them as an infinite sequence (Section 2,3) and (2) stacking them as a 2d array (Section 4).
Besides, we will see that Bayesian nonparametric methods are useful to model discrete patterns in time series data. We will showcase two examples: (1) using variance Gamma processes to model change points (Section 5), and (2) using Chinese restaurant processes to model speech with switching speakers (Section 6).
We also aware that the inference problem can be non-trivial in popular Bayesian nonparametric models. In Section 7, we find a novel solution of online inference for the popular HDP-HMM model
Discrete scale invariance and complex dimensions
We discuss the concept of discrete scale invariance and how it leads to
complex critical exponents (or dimensions), i.e. to the log-periodic
corrections to scaling. After their initial suggestion as formal solutions of
renormalization group equations in the seventies, complex exponents have been
studied in the eighties in relation to various problems of physics embedded in
hierarchical systems. Only recently has it been realized that discrete scale
invariance and its associated complex exponents may appear ``spontaneously'' in
euclidean systems, i.e. without the need for a pre-existing hierarchy. Examples
are diffusion-limited-aggregation clusters, rupture in heterogeneous systems,
earthquakes, animals (a generalization of percolation) among many other
systems. We review the known mechanisms for the spontaneous generation of
discrete scale invariance and provide an extensive list of situations where
complex exponents have been found. This is done in order to provide a basis for
a better fundamental understanding of discrete scale invariance. The main
motivation to study discrete scale invariance and its signatures is that it
provides new insights in the underlying mechanisms of scale invariance. It may
also be very interesting for prediction purposes.Comment: significantly extended version (Oct. 27, 1998) with new examples in
several domains of the review paper with the same title published in Physics
Reports 297, 239-270 (1998
Bayesian Inference for Retrospective Population Genetics Models Using Markov Chain Monte Carlo Methods
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes.
In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given.
Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data.
In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.PerinnöllisyystieteessÀ eli genetiikassa tutkitaan perinnöllisen aineksen rakennetta, toimintaa ja muuntelua sekÀ muita yksilöiden vÀliseen vaihteluun vaikuttavia tekijöitÀ eliökunnassa. Nykyisten laboratoriomenetelmien avulla on mahdollista kerÀtÀ eliöistÀ yhÀ tarkempia ja laajempia molekyylitason aineistoja. TÀllaisten aineistojen kÀsittelemiseksi tarvitaan tilastollisia malleja, jotka hyödyntÀvÀt mahdollisimman tarkasti kÀytettÀvissÀ olevaa tietÀmystÀ biologisista prosesseista, joiden tuloksena kerÀtyt aineistot ovat muodostuneet.
TÀssÀ vÀitöskirjassa kehitetÀÀn BayeslÀisen tilastotieteen malleja erÀille geneettisille prosesseille sekÀ sovelletaan malleja esimerkkiaineistoihin. PÀÀpaino on yksilöiden yhteisen lÀhihistorian mallittamisessa. Yksinkertaisimmillaan lÀhtökohtana on joukko nykyhetken yksilöitÀ, joiden perinnöllinen aines oletetaan tunnetuksi tietyissÀ merkkigeenikohdissa laboratoriossa suoritettujen genotyyppimittausten perusteella. Tilastollista mallia kÀytetÀÀn arvioimaan todennÀköisyyksiÀ erilaisille yksilöitÀ yhdistÀville lÀhihistorioille, jotka kuvataan sukupuurakenteiden sekÀ merkkigeenien periytymisreittien avulla. Tarkasteltavat aikajaksot ovat enintÀÀn kymmeniÀ sukupolvia.
VÀitöskirjassa myös hyödynnetÀÀn lÀhihistoriamallia geenikartoitussovelluksessa, jonka tavoitteena on paikallistaa sellaisia kohtia genomista, joilla on vaikutusta tiettyyn yksilöistÀ mitattuun tai havaittuun ominaisuuteen. Muita sovelluskohteita ovat populaatiorakenteen arviointi sekÀ yksilöiden vÀlisten sukulaisuusasteiden arviointi
Bayesian phylogenetic modelling of lateral gene transfers
PhD ThesisPhylogenetic trees represent the evolutionary relationships between a set of species.
Inferring these trees from data is particularly challenging sometimes since the transfer
of genetic material can occur not only from parents to their o spring but also
between organisms via lateral gene transfers (LGTs). Thus, the presence of LGTs
means that genes in a genome can each have di erent evolutionary histories, represented
by di erent gene trees.
A few statistical approaches have been introduced to explore non-vertical evolution
through collections of Markov-dependent gene trees. In 2005 Suchard described
a Bayesian hierarchical model for joint inference of gene trees and an underlying
species tree, where a layer in the model linked gene trees to the species tree via a
sequence of unknown lateral gene transfers. In his model LGT was modeled via a
random walk in the tree space derived from the subtree prune and regraft (SPR)
operator on unrooted trees. However, the use of SPR moves to represent LGT in an
unrooted tree is problematic, since the transference of DNA between two organisms
implies the contemporaneity of both organisms and therefore it can allow unrealistic
LGTs.
This thesis describes a related hierarchical Bayesian phylogenetic model for
reconstructing phylogenetic trees which imposes a temporal constraint on LGTs,
namely that they can only occur between species which exist concurrently. This is
achieved by taking into account possible time orderings of divergence events in trees,
without explicitly modelling divergence times. An extended version of the SPR operator
is introduced as a more adequate mechanism to represent the LGT e ect in a
tree. The extended SPR operation respects the time ordering. It additionaly di ers
from regular SPR as it maintains a 1-to-1 correspondence between points on the
species tree and points on each gene tree. Each point on a gene tree represents the
existence of a population containing that gene at some point in time. Hierarchical
phylogenetic models were used in the reconstruction of each gene tree from its
corresponding gene alignment, enabling the pooling of information across genes. In
addition to Suchard's approach, we assume variation in the rate of evolution between
di erent sites. The species tree is assumed to be xed.
A Markov Chain Monte Carlo (MCMC) algorithm was developed to t the model
in a Bayesian framework. A novel MCMC proposal mechanism for jointly proposing
the gene tree topology and branch lengths, LGT distance and LGT history has been
developed as well as a novel graphical tool to represent LGT history, the LGT Biplot.
Our model was applied to simulated and experimental datasets. More speci cally we
analysed LGT/reassortment presence in the evolution of 2009 Swine-Origin In
uenza
Type A virus. Future improvements of our model and algorithm should include joint
inference of the species tree, improving the computational e ciency of the MCMC
algorithm and better consideration of other factors that can cause discordance of
gene trees and species trees such as gene loss
Large Scale Stochastic Dynamics
The goal of this workshop was to explore the recent advances in the
mathematical understanding of the macroscopic properties which emerge on large space-time scales from interacting microscopic particle systems. There were 55 participants,
including postdocs and graduate students, working in diverse
intertwining areas of probability and statistical mechanics. During
the meeting, 29 talks of 45 minutes were scheduled and an evening
session was organised with 10 more short talks of 10 minutes, mostly by younger participants.
These talks addressed the following topics :
randomness emerging from deterministic dynamics,
hydrodynamic limits, interface growth models and slow convergence to
equilibrium in kinetically
constrained dynamics
- âŠ