Search CORE

34,126 research outputs found

Approximate Common Intervals in Multiple Genome Comparison

Author: Château Annie
Riou Pierre
Rivals Eric
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/11/2011
Field of study

International audienceWe consider the problem of inferring approximate common intervals of multiple genomes. Genomes are modelled as sequences of homologous genes families identifiers, and approximate common intervals represent conserved regions possibly showing rearrangements, as well as repetitions, or insertions/deletions. This problem is already known, but existing approaches are not incremental and somehow limited to special cases. We adopt a simple, classical graph-based approach, where the vertices of the graph represent the exact common intervals of the sequences (\ie, regions containing the same gene set), and where edges link vertices that differ by less than

\delta

elements (with

\delta

being parameter). With this model, approximate gene clusters are maximal cliques of the graph: computing them can exploit known and well designed algorithms. For a proof of concept, we applied the method to several datasets of bacterial genomes and compared the two maximal cliques algorithms, a static and a dynamic one. While being quite flexible, this approach opens the way to a combinatorial characterization of genomic rearrangements in terms of graph substructures

Crossref

HAL Descartes

Hal-Diderot

Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing

Author: Shen Jeremy J.
Zhang Nancy R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We propose a flexible change-point model for inhomogeneous Poisson Processes, which arise naturally from next-generation DNA sequencing, and derive score and generalized likelihood statistics for shifts in intensity functions. We construct a modified Bayesian information criterion (mBIC) to guide model selection, and point-wise approximate Bayesian confidence intervals for assessing the confidence in the segmentation. The model is applied to DNA Copy Number profiling with sequencing data and evaluated on simulated spike-in and real data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS517 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Estimating the relative rate of recombination to mutation in bacteria from single-locus variants using composite likelihood methods

Author: Biggs Patrick
Fearnhead Paul
French Nigel
Holland Barbara
Yu Shoukai
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/03/2015
Field of study

A number of studies have suggested using comparisons between DNA sequences of closely related bacterial isolates to estimate the relative rate of recombination to mutation for that bacterial species. We consider such an approach which uses single-locus variants: pairs of isolates whose DNA differ at a single gene locus. One way of deriving point estimates for the relative rate of recombination to mutation from such data is to use composite likelihood methods. We extend recent work in this area so as to be able to construct confidence intervals for our estimates, without needing to resort to computationally-intensive bootstrap procedures, and to develop a test for whether the relative rate varies across loci. Both our test and method for constructing confidence intervals are obtained by modeling the dependence structure in the data, and then applying asymptotic theory regarding the distribution of estimators obtained using a composite likelihood. We applied these methods to multi-locus sequence typing (MLST) data from eight bacteria, finding strong evidence for considerable rate variation in three of these: Bacillus cereus, Enterococcus faecium and Klebsiella pneumoniae.Comment: Published at http://dx.doi.org/10.1214/14-AOAS795 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Lancaster E-Prints

Detecting simultaneous variant intervals in aligned sequences

Author: Siegmund David
Yakir Benjamin
Zhang Nancy R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

Given a set of aligned sequences of independent noisy observations, we are concerned with detecting intervals where the mean values of the observations change simultaneously in a subset of the sequences. The intervals of changed means are typically short relative to the length of the sequences, the subset where the change occurs, the "carriers," can be relatively small, and the sizes of the changes can vary from one sequence to another. This problem is motivated by the scientific problem of detecting inherited copy number variants in aligned DNA samples. We suggest a statistic based on the assumption that for any given interval of changed means there is a given fraction of samples that carry the change. We derive an analytic approximation for the false positive error probability of a scan, which is shown by simulations to be reasonably accurate. We show that the new method usually improves on methods that analyze a single sample at a time and on our earlier multi-sample method, which is most efficient when the carriers form a large fraction of the set of sequences. The proposed procedure is also shown to be robust with respect to the assumed fraction of carriers of the changes.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS400 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarlyCommons@Penn

Microevolution of Helicobacter pylori during prolonged infection of single hosts and within families

Author: A Gelman
A Mena
A Tomitani
B Bjorkholm
B Linz
Barica Kusecek
BF Voight
Christelle Bahlawane
D Falush
D Falush
D Kersulyte
Daniel Falush
DE Berg
DJ Wilson
EA Lin
EC Holmes
EC Holmes
EE Smith
EJ Javaux
EJ Kuipers
EP Rocha
FU Battistuzzi
GI Peterson
Giovanna Morelli
GM Pupo
H Ochman
H Ochman
Harmit S. Malik
HD Holland
J Kang
J Parkhill
J Raymond
JF Tomb
JK Pritchard
JM Kang
K Thornton
KA Jolley
L Feng
M Achtman
M Achtman
M Achtman
M Eppinger
MA Beaumont
Mark Achtman
MM Mwangi
NA Moran
NJ Butterfield
NS Taylor
P Marjoram
P Roumagnac
PK Ingvarsson
PP Sheridan
RJ Meinersmann
S Chattopadhyay
S Kryazhimskiy
S Kulick
S Schwarz
S Sreevatsan
S Suerbaum
S Suerbaum
S Talarico
S Tavare
Sandra Schwarz
Sebastian Suerbaum
SJ Weissman
SR Harris
SR Leopold
SY Ho
SY Ho
T Wirth
T Wirth
T Wirth
U Nübel
X Didelot
Xavier Didelot
Y Moodley
Z Lin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2010
Field of study

Our understanding of basic evolutionary processes in bacteria is still very limited. For example, multiple recent dating estimates are based on a universal inter-species molecular clock rate, but that rate was calibrated using estimates of geological dates that are no longer accepted. We therefore estimated the short-term rates of mutation and recombination in Helicobacter pylori by sequencing an average of 39,300 bp in 78 gene fragments from 97 isolates. These isolates included 34 pairs of sequential samples, which were sampled at intervals of 0.25 to 10.2 years. They also included single isolates from 29 individuals (average age: 45 years) from 10 families. The accumulation of sequence diversity increased with time of separation in a clock-like manner in the sequential isolates. We used Approximate Bayesian Computation to estimate the rates of mutation, recombination, mean length of recombination tracts, and average diversity in those tracts. The estimates indicate that the short-term mutation rate is 1.4×10−6 (serial isolates) to 4.5×10−6 (family isolates) per nucleotide per year and that three times as many substitutions are introduced by recombination as by mutation. The long-term mutation rate over millennia is 5–17-fold lower, partly due to the removal of non-synonymous mutations due to purifying selection. Comparisons with the recent literature show that short-term mutation rates vary dramatically in different bacterial species and can span a range of several orders of magnitude

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Oxford University Research Archive

Open Repository and Bibliography - Luxembourg

Genome-wide inference of ancestral recombination graphs

Author: Gronau Ilan
Hubisz Melissa J.
Rasmussen Matthew D.
Siepel Adam
Publication venue
Publication date: 01/01/2013
Field of study

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. Preliminary results also indicate that our methods can be used to gain insight into complex features of human population structure, even with a noninformative prior distribution.Comment: 88 pages, 7 main figures, 22 supplementary figures. This version contains a substantially expanded genomic data analysi

arXiv.org e-Print Archive

CiteSeerX

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

FigShare

Genomic Selective Constraints in Murid Noncoding DNA

Author: Altschul
Bejerano
Bejerano
Boissinot
Bray
Britten
Casane
Chamary
Chamary
Cooper
Cooper
Daniel J. Gaffney
Deininger
Dermitzakis
Dermitzakis
Dermitzakis
Eisenberg
Eyre-Walker
Fairbrother
Frazer
Gaffney
Gibbs
Hanawalt
Hubbard
Jaeger
Kamal
Keightley
Keightley
Keightley
Keightley
Kimura
Kondrashov
Kondrashov
Lander
Li
Margulies
Meunier
Mi
Mikkelsen
Nagylaki
Nelson
Parmley
Peter D. Keightley
Seoighe
Siepel
Sironi
Sorek
Tamura
Thomas
Thompson
Urrutia
Vinogradov
Vinogradov
Waterston
Webster
Yelin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2006
Field of study

Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5' end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer