Search CORE

arXiv.org e-Print Archive

Adaptive approximate Bayesian computation for complex models

Author: CC Drovandi
D Wegmann
D Wegmann
Franck Jabot
Guillaume Deffuant
MA Beaumont
Maxime Lenormand
MGB Blum
P Glynn
P Marjoram
P Moral Del
P Moral Del
SA Sisson
T Toni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Approximate Bayesian computation (ABC) is a family of computational techniques in Bayesian statistics. These techniques allow to fi t a model to data without relying on the computation of the model likelihood. They instead require to simulate a large number of times the model to be fi tted. A number of re finements to the original rejection-based ABC scheme have been proposed, including the sequential improvement of posterior distributions. This technique allows to de- crease the number of model simulations required, but it still presents several shortcomings which are particu- larly problematic for costly to simulate complex models. We here provide a new algorithm to perform adaptive approximate Bayesian computation, which is shown to perform better on both a toy example and a complex social model.Comment: 14 pages, 5 figure

HAL Descartes

Modeling measurement error in tumor characterization studies

Author: Laird Peter W
Marjoram Paul
Rakovski Cyril
Siegmund Kimberly D
Weisenberger Daniel J
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Etiologic studies of cancer increasingly use molecular features such as gene expression, DNA methylation and sequence mutation to subclassify the cancer type. In large population-based studies, the tumor tissues available for study are archival specimens that provide variable amounts of amplifiable DNA for molecular analysis. As molecular features measured from small amounts of tumor DNA are inherently noisy, we propose a novel approach to improve statistical efficiency when comparing groups of samples. We illustrate the phenomenon using the MethyLight technology, applying our proposed analysis to compare <it>MLH1 </it>DNA methylation levels in males and females studied in the Colon Cancer Family Registry. Results We introduce two methods for computing empirical weights to model heteroscedasticity that is caused by sampling variable quantities of DNA for molecular analysis. In a simulation study, we show that using these weights in a linear regression model is more powerful for identifying differentially methylated loci than standard regression analysis. The increase in power depends on the underlying relationship between variation in outcome measure and input DNA quantity in the study samples. Conclusions Tumor characteristics measured from small amounts of tumor DNA are inherently noisy. We propose a statistical analysis that accounts for the measurement error due to sampling variation of the molecular feature and show how it can improve the power to detect differential characteristics between patient groups.</p

Springer - Publisher Connector

Chapman University Digital Commons

Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks

Author: Andrieu C.
Andrieu C.
Beaumont M. A.
Besag J.
Besag J.
Besag J.
Caimo A.
Carter C.
Del Moral P.
Frank O.
Friel N.
Geyer C. J.
Geyer C. J.
Green P. J.
Grelaud A.
Hamze F.
Higdon D. M.
Koskinen J. H.
Marjoram P.
Murray I.
Murray I.
Møller J.
Neal R.
Pritchard J. K.
Propp J. G.
Richard G. Everitt
Robert C. P.
Sisson S. A.
Snijders T. A. B.
Tierney L.
Wasserman S.
Publication venue
Publication date: 01/01/2012
Field of study

Undirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.Comment: 26 pages, 2 figures, accepted in Journal of Computational and Graphical Statistics (http://www.amstat.org/publications/jcgs.cfm

arXiv.org e-Print Archive

Central Archive at the University of Reading

CiteSeerX

Warwick Research Archives Portal Repository

ABCtoolbox: a versatile toolkit for approximate Bayesian computations

Author: C Leuenberger
CCK Boyce
Christoph Leuenberger
D Wegmann
Daniel Wegmann
G Hamilton
G Heckel
G Laval
G Weiss
JK Pritchard
JM Cornuet
JS Lopes
K Thornton
L Excoffier
Laurent Excoffier
M Beaumont
M Chadeau-Hyam
M Currat
M Schweizer
M Tenehaus
MA Beaumont
MA Beaumont
O Ratmann
P Bortot
P Marjoram
P Marjoram
RR Hudson
S Braaker
S Fink
S Tavaré
SA Sisson
SA Sisson
Samuel Neuenschwander
T Toni
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

BACKGROUND: The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations. RESULTS: Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females. CONCLUSION: ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results

Springer - Publisher Connector

Serveur académique lausannois

eScholarship - University of California

Bern Open Repository and Information System (BORIS)

Using DNA Methylation Patterns to Infer Tumor Ancestry

Author: ARA Anderson
Darryl Shibata
H Enderling
JL Tsao
KD Siegmund
KD Siegmund
KD Siegmund
Kimberly D. Siegmund
MA Beaumont
MR Lacey
P Gerlee
P Nicolas
Paul Marjoram
PC Nowell
RG Abbott
Xiaoyu Zhang
Y Yatabe
You Jin Hong
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Exactly how human tumors grow is uncertain because serial observations are impractical. One approach to reconstruct the histories of individual human cancers is to analyze the current genomic variation between its cells. The greater the variations, on average, the greater the time since the last clonal evolution cycle (‘‘a molecular clock hypothesis’’). Here we analyze passenger DNA methylation patterns from opposite sides of 12 primary human colorectal cancers (CRCs) to evaluate whether the variation (pairwise distances between epialleles) is consistent with a single clonal expansion after transformation. Methodology/Principal Findings: Data from 12 primary CRCs are compared to epigenomic data simulated under a single clonal expansion for a variety of possible growth scenarios. We find that for many different growth rates, a single clonal expansion can explain the population variation in 11 out of 12 CRCs. In eight CRCs, the cells from different glands are all equally distantly related, and cells sampled from the same tumor half appear no more closely related than cells sampled from opposite tumor halves. In these tumors, growth appears consistent with a single ‘‘symmetric’ ’ clonal expansion. In three CRCs, the variation in epigenetic distances was different between sides, but this asymmetry could be explained by a single clonal expansion with one region of a tumor having undergone more cell division than the other. The variation in one CRC was complex and inconsistent with a simple single clonal expansion

CiteSeerX

Public Library of Science (PLOS)

Many private mutations originate from the first few divisions of a human colorectal adenoma.

Author: Curtis C
Kang H
Marjoram P
Press MF
Salomon MP
Shibata D
Siegmund K
Sottoriva A
Toy M
Zhao J
Publication venue: 'Wiley'
Publication date: 01/11/2015
Field of study

Intratumoural mutational heterogeneity (ITH) or the presence of different private mutations in different parts of the same tumour is commonly observed in human tumours. The mechanisms generating such ITH are uncertain. Here we find that ITH can be remarkably well structured by measuring point mutations, chromosome copy numbers, and DNA passenger methylation from opposite sides and individual glands of a 6 cm human colorectal adenoma. ITH was present between tumour sides and individual glands, but the private mutations were side-specific and subdivided the adenoma into two major subclones. Furthermore, ITH disappeared within individual glands because the glands were clonal populations composed of cells with identical mutant genotypes. Despite mutation clonality, the glands were relatively old, diverse populations when their individual cells were compared for passenger methylation and by FISH. These observations can be organized into an expanding star-like ancestral tree with co-clonal expansion, where many private mutations and multiple related clones arise during the first few divisions. As a consequence, most detectable mutational ITH in the final tumour originates from the first few divisions. Much of the early history of a tumour, especially the first few divisions, may be embedded within the detectable ITH of tumour genomes

Institute of Cancer Research Repository

Genetic Variation in Native Americans, Inferred from Latino SNP and Resequencing Data

Author: Browning
C. Eng
C. Gignoux
Choudhry
Conrad
Crawford
Frazer
G. K. Chen
Harding
Hey
Hudson
J. D. Wall
Jakobsson
Kaessmann
Li
Mao
Meltzer
Nettle
Novembre
P. Marjoram
Plagnol
Price
R. Jiang
Reich
S. Huntsman
Salari
Sankararaman
Tang
Vigilant
Wang
Watterson
Publication venue: Oxford University Press
Publication date: 01/08/2011
Field of study

Analyses of genetic polymorphism data have the potential to be highly informative about the demographic history of Native American populations, but due to a combination of historical and political factors, there are essentially no autosomal sequence polymorphism data from any Native American group. However, there are many resequencing studies involving Latinos, whose genomes contain segments inherited from their Native American ancestors. In this study, we introduce a new method for estimating local ancestry across the genomes of admixed individuals and show how this method, along with dense genotyping and targeted resequencing, can be used to assay genetic variation in ancestral Native American groups. We analyze roughly 6 Mb of resequencing data from 22 Mexican Americans to provide the first large-scale view of sequence level variation in Native Americans. We observe low levels of diversity and high levels of linkage disequilibrium in the Native American–derived sequences, consistent with a recent severe population bottleneck associated with the initial peopling of the Americas. Using two different computational approaches, one novel, we estimate that this bottleneck occurred roughly 12.5 Kya; when uncertainty in the estimation process is taken into account, our results are consistent with archeological estimates for the colonization of the Americas

eScholarship - University of California

Microevolution of Helicobacter pylori during prolonged infection of single hosts and within families

Author: A Gelman
A Mena
A Tomitani
B Bjorkholm
B Linz
Barica Kusecek
BF Voight
Christelle Bahlawane
D Falush
D Falush
D Kersulyte
Daniel Falush
DE Berg
DJ Wilson
EA Lin
EC Holmes
EC Holmes
EE Smith
EJ Javaux
EJ Kuipers
EP Rocha
FU Battistuzzi
GI Peterson
Giovanna Morelli
GM Pupo
H Ochman
H Ochman
Harmit S. Malik
HD Holland
J Kang
J Parkhill
J Raymond
JF Tomb
JK Pritchard
JM Kang
K Thornton
KA Jolley
L Feng
M Achtman
M Achtman
M Achtman
M Eppinger
MA Beaumont
Mark Achtman
MM Mwangi
NA Moran
NJ Butterfield
NS Taylor
P Marjoram
P Roumagnac
PK Ingvarsson
PP Sheridan
RJ Meinersmann
S Chattopadhyay
S Kryazhimskiy
S Kulick
S Schwarz
S Sreevatsan
S Suerbaum
S Suerbaum
S Talarico
S Tavare
Sandra Schwarz
Sebastian Suerbaum
SJ Weissman
SR Harris
SR Leopold
SY Ho
SY Ho
T Wirth
T Wirth
T Wirth
U Nübel
X Didelot
Xavier Didelot
Y Moodley
Z Lin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2010
Field of study

Our understanding of basic evolutionary processes in bacteria is still very limited. For example, multiple recent dating estimates are based on a universal inter-species molecular clock rate, but that rate was calibrated using estimates of geological dates that are no longer accepted. We therefore estimated the short-term rates of mutation and recombination in Helicobacter pylori by sequencing an average of 39,300 bp in 78 gene fragments from 97 isolates. These isolates included 34 pairs of sequential samples, which were sampled at intervals of 0.25 to 10.2 years. They also included single isolates from 29 individuals (average age: 45 years) from 10 families. The accumulation of sequence diversity increased with time of separation in a clock-like manner in the sequential isolates. We used Approximate Bayesian Computation to estimate the rates of mutation, recombination, mean length of recombination tracts, and average diversity in those tracts. The estimates indicate that the short-term mutation rate is 1.4×10−6 (serial isolates) to 4.5×10−6 (family isolates) per nucleotide per year and that three times as many substitutions are introduced by recombination as by mutation. The long-term mutation rate over millennia is 5–17-fold lower, partly due to the removal of non-synonymous mutations due to purifying selection. Comparisons with the recent literature show that short-term mutation rates vary dramatically in different bacterial species and can span a range of several orders of magnitude

Public Library of Science (PLOS)