Search CORE

11 research outputs found

Single-crossover dynamics: finite versus infinite populations

Author: C. Burke
D. McHale
E. Baake
E. Baake
E. Popa
Ellen Baake
G. Ringwood
H. Geiringer
Inke Herms
J. Bennett
J.G. Kemeny
J.R. Norris
K. Dawson
M. Aigner
M. Baake
M. Baake
N. Barton
P. Haccou
P. Pfaffelhuber
R. Bürger
R. Durrett
S. Asmussen
S.N. Ethier
W. Ewens
Y. Lyubich
Publication venue
Publication date: 01/01/2007
Field of study

Populations evolving under the joint influence of recombination and resampling (traditionally known as genetic drift) are investigated. First, we summarise and adapt a deterministic approach, as valid for infinite populations, which assumes continuous time and single crossover events. The corresponding nonlinear system of differential equations permits a closed solution, both in terms of the type frequencies and via linkage disequilibria of all orders. To include stochastic effects, we then consider the corresponding finite-population model, the Moran model with single crossovers, and examine it both analytically and by means of simulations. Particular emphasis is on the connection with the deterministic solution. If there is only recombination and every pair of recombined offspring replaces their pair of parents (i.e., there is no resampling), then the {\em expected} type frequencies in the finite population, of arbitrary size, equal the type frequencies in the infinite population. If resampling is included, the stochastic process converges, in the infinite-population limit, to the deterministic dynamics, which turns out to be a good approximation already for populations of moderate size.Comment: 21 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Publications at Bielefeld University

Probabilistic arithmetic automata : applications of a stochastic computational framework in biological sequence analysis

Author: Herms Inke
Publication venue: Bielefeld University
Publication date: 01/01/2009
Field of study

Herms I. Probabilistic arithmetic automata : applications of a stochastic computational framework in biological sequence analysis. Bielefeld (Germany): Bielefeld University; 2009.The immense amount of biological sequence data available these days requires efficient and sensitive analysis in order to provide e.g. the identification of unknown proteins, or information about the similarity between DNA sequences. Furthermore, new challenges to computational sequence analysis are posed by short sequence reads resulting from modern high throughput sequencing technologies such as 454 or Solexa/Illumina. Viewing biological sequences, such as DNA and proteins, as strings allows their investigation under a generative random string model. That is to say, one can define a probabilistic null model that generates random strings as representatives of a class of sequences. From these, one can deduce general statistical properties. In this thesis, we give a thorough derivation of a probabilistic model, called probabilistic arithmetic automaton (PAA). This models sequences of operations associated to operands depending on chance and provides the computational framework to calculate the exact distribution of the value resulting from those operations. For instance, the PAA framework can be used to compute the expected molecular mass of a peptide resulting from the cleavage reaction of a protease. Moreover, we show that the framework is sufficiently general to cover completely different applications arising in the computational analysis of biological sequences. To this end, we consider three distinct levels of biosequences, namely 1) amino acid sequences, 2) long DNA sequences and genomes, and 3) short nucleotide sequence reads. In the first application, protein identification by means of mass spectrometry and database search, we compute characteristical statistics of so-called peptide mass fingerprints to obtain a reasonable, database-independent significance value for the identification of an unknown protein. Going one step further than recent approaches, we additionally incorporate post-translational modifications and incomplete enzymatic digestion that alter the measured molecular masses and, hence, may influence the search results. The second application arises from the context of DNA similarity search. We use the PAA framework to investigate the quality of filtration criteria employed to select candidate sequences from a comprehensive nucleotide sequence database. The PAA we propose comprises recent models and provides additional statistics. This allows us to investigate different definitions of optimality not discussed formerly. Searching for similar DNA sequences, which provides the basis for comparative genomics in general, was enabled by the growing amount of nucleotide sequences stored in sequence databases. This development was accelerated by high throughput sequencing strategies such as 454 sequencing, that allow for faster sequencing at reduced price. However, these technologies yield relatively short reads of sequenced nucleotides, which poses new challenges to genome assembly tools. By means of the PAA approach, we compute the length distribution of sequence reads resulting from 454 sequencing. Moreover, we discuss how to adjust the machine settings to obtain on average the longest reads possible. The designed PAA is used for evaluation. Besides the PAA framework and its applications, we present a biologically motivated random string model adjusted to protein sequences, referred to as SSE model. It captures properties of local segments forming protein secondary structures. In order to evaluate the model's capability, we compare four random string models by means of penalized model selection criteria. We show that among these models, the SSE model yields the most plausible description of considered protein sequences, outperforming the widely used i.i.d. and first-order Markov model

Publications at Bielefeld University

Probabilistic Arithmetic Automata and Their Applications

Author: Herms Inke
Kaltenbach Hans-Michael
Marschall T.
Rahmann Sven
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Marschall T, Herms I, Kaltenbach H-M, Rahmann S. Probabilistic Arithmetic Automata and Their Applications. Ieee/Acm Transactions On Computational Biology And Bioinformatics. 2012;9(6):1737-1750.We present a comprehensive review on probabilistic arithmetic automata (PAAs), a general model to describe chains of operations whose operands depend on chance, along with two algorithms to numerically compute the distribution of the results of such probabilistic calculations. PAAs provide a unifying framework to approach many problems arising in computational biology and elsewhere. We present five different applications, namely 1) pattern matching statistics on random texts, including the computation of the distribution of occurrence counts, waiting times, and clump sizes under hidden Markov background models; 2) exact analysis of window-based pattern matching algorithms; 3) sensitivity of filtration seeds used to detect candidate sequence alignments; 4) length and mass statistics of peptide fragments resulting from enzymatic cleavage reactions; and 5) read length statistics of 454 and IonTorrent sequencing reads. The diversity of these applications indicates the flexibility and unifying character of the presented framework. While the construction of a PAA depends on the particular application, we single out a frequently applicable construction method: We introduce deterministic arithmetic automata (DAAs) to model deterministic calculations on sequences, and demonstrate how to construct a PAA from a given DAA and a finite-memory random text model. This procedure is used for all five discussed applications and greatly simplifies the construction of PAAs. Implementations are available as part of the MoSDi package. Its application programming interface facilitates the rapid development of new applications based on the PAA framework

CWI's Institutional Repository

Publications at Bielefeld University

Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata

Author: Crandall Keith A.
Herms Inke
Lagergren Jens
Rahmann Sven
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Herms I, Rahmann S. Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata. In: Crandall KA, Lagergren J, eds. Algorithms in Bioinformatics: 8th International Workshop, WABI 2008, Karlsruhe, Germany, September 15-19, 2008. Proceedings. Lecture Notes in Computer Science, 5251. Berlin u.a.: Springer; 2008: 318-329

Publications at Bielefeld University

Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling

Author: Hartmann Alexander K.
Herms Inke
Rahmann Sven
Wolfsheimer Stefan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Wolfsheimer S, Herms I, Rahmann S, Hartmann AK. Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling. BMC Bioinformatics. 2011;12(1): 47.Background: Molecular database search tools need statistical models to assess the significance for the resulting hits. In the classical approach one asks the question how probable a certain score is observed by pure chance. Asymptotic theories for such questions are available for two random i.i.d. sequences. Some effort had been made to include effects of finite sequence lengths and to account for specific compositions of the sequences. In many applications, such as a large-scale database homology search for transmembrane proteins, these models are not the most appropriate ones. Search sensitivity and specificity benefit from position-dependent scoring schemes or use of Hidden Markov Models. Additional, one may wish to go beyond the assumption that the sequences are i.i.d. Despite their practical importance, the statistical properties of these settings have not been well investigated yet. Results: In this paper, we discuss an efficient and general method to compute the score distribution to any desired accuracy. The general approach may be applied to different sequence models and and various similarity measures that satisfy a few weak assumptions. We have access to the low-probability region ("tail") of the distribution where scores are larger than expected by pure chance and therefore relevant for practical applications. Our method uses recent ideas from rare-event simulations, combining Markov chain Monte Carlo simulations with importance sampling and generalized ensembles. We present results for the score statistics of fixed and random queries against random sequences. In a second step, we extend the approach to a model of transmembrane proteins, which can hardly be described as i.i.d. sequences. For this case, we compare the statistical properties of a fixed query model as well as a hidden Markov sequence model in connection with a position based scoring scheme against the classical approach. Conclusions: The results illustrate that the sensitivity and specificity strongly depend on the underlying scoring and sequence model. A specific ROC analysis for the case of transmembrane proteins supports our observation

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

Probabilistic Arithmetic Automata and Their Applications

Author: Hans-Michael Kaltenbach
Inke Herms
Sven Rahmann
Tobias Marschall
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Dense genotyping of immune-related disease regions identifies nine new risk loci for primary sclerosing cholangitis.

Author: Achkar Jean-Paul
Albrecht Mario
Alexander Graeme
Alvaro Domenico
Anderson Carl A
Andreassen Ole A
Annese Vito
Bergquist Annika
Björnsson Einar
Boberg Kirsten Muri
Bowlus Christopher L
Brand Stephan
Braun Felix
Chapman Roger W
Chazouillères Olivier
Cho Judy
Cleynen Isabelle
Croucher Peter JP
Dale Anders M
Dalekos Georgios
Doncheva Nadezhda T
Dorfman Ruslan
Duerr Richard H
Durie Peter R
Eksteen Bertus
Ellinghaus David
Ellinghaus Eva
Floreani Annarosa
Folseraas Trine
Franke Andre
Färkkilä Martti
Gotthardt Daniel Nils
Gutierrez-Achury Javier
Herms Stefan
Hirschfield Gideon M
Hov Johannes Roksund
Hveem Kristian
International IBD Genetics Consortium
International PSC Study Group
Invernizzi Pietro
Juran Brian D
Karlsen Tom H
König Inke R
Lazaridis Konstantinos N
Leppa Virpi
Liu Jimmy Z
Manns Michael P
Marschall Hanns-Ulrich
Mason Andrew L
Mayr Gabriele
Melum Espen
Milkiewicz Piotr
Mitrovic Mitja
Müller Tobias
Næss Sigrid
Nöthen Markus M
Onengut-Gumuscu Suna
Padyukov Leonid
Pares Albert
Ponsioen Cyriel Y
Ricaño-Ponce Isis
Rich Stephen S
Rioux John D
Rushbrook Simon M
Rust Christian
Saarela Janna
Sandford Richard N
Sans Miquel
Schork Andrew J
Schramm Christoph
Schreiber Stefan
Schrumpf Erik
Shah Tejas
Silverberg Mark S
Srivastava Brijesh
Sterneck Martina
Teufel Andreas
Thompson Wesley K
Thomsen Ingo
UK-PSCSC Consortium
van Heel David
Vatn Morten H
Vermeire Severine
Weersma Rinse K
Weismüller Tobias J
Wijmenga Cisca
Winkelmann Juliane
Publication venue: eScholarship, University of California
Publication date: 01/06/2013
Field of study

Primary sclerosing cholangitis (PSC) is a severe liver disease of unknown etiology leading to fibrotic destruction of the bile ducts and ultimately to the need for liver transplantation. We compared 3,789 PSC cases of European ancestry to 25,079 population controls across 130,422 SNPs genotyped using the Immunochip. We identified 12 genome-wide significant associations outside the human leukocyte antigen (HLA) complex, 9 of which were new, increasing the number of known PSC risk loci to 16. Despite comorbidity with inflammatory bowel disease (IBD) in 72% of the cases, 6 of the 12 loci showed significantly stronger association with PSC than with IBD, suggesting overlapping yet distinct genetic architectures for these two diseases. We incorporated association statistics from 7 diseases clinically occurring with PSC in the analysis and found suggestive evidence for 33 additional pleiotropic PSC risk loci. Together with network analyses, these findings add to the genetic risk map of PSC and expand on the relationship between PSC and other immune-mediated diseases

eScholarship - University of California

Recommended from our members

Dense genotyping of immune-related disease regions identifies nine new risk loci for primary sclerosing cholangitis.

Author: Achkar Jean-Paul
Albrecht Mario
Alexander Graeme
Alvaro Domenico
Anderson Carl A
Andreassen Ole A
Annese Vito
Bergquist Annika
Björnsson Einar
Boberg Kirsten Muri
Bowlus Christopher L
Brand Stephan
Braun Felix
Chapman Roger W
Chazouillères Olivier
Cho Judy
Cleynen Isabelle
Croucher Peter JP
Dale Anders M
Dalekos Georgios
Doncheva Nadezhda T
Dorfman Ruslan
Duerr Richard H
Durie Peter R
Eksteen Bertus
Ellinghaus David
Ellinghaus Eva
Floreani Annarosa
Folseraas Trine
Franke Andre
Färkkilä Martti
Gotthardt Daniel Nils
Gutierrez-Achury Javier
Herms Stefan
Hirschfield Gideon M
Hov Johannes Roksund
Hveem Kristian
International IBD Genetics Consortium
International PSC Study Group
Invernizzi Pietro
Juran Brian D
Karlsen Tom H
König Inke R
Lazaridis Konstantinos N
Leppa Virpi
Liu Jimmy Z
Manns Michael P
Marschall Hanns-Ulrich
Mason Andrew L
Mayr Gabriele
Melum Espen
Milkiewicz Piotr
Mitrovic Mitja
Müller Tobias
Næss Sigrid
Nöthen Markus M
Onengut-Gumuscu Suna
Padyukov Leonid
Pares Albert
Ponsioen Cyriel Y
Ricaño-Ponce Isis
Rich Stephen S
Rioux John D
Rushbrook Simon M
Rust Christian
Saarela Janna
Sandford Richard
Sans Miquel
Schork Andrew J
Schramm Christoph
Schreiber Stefan
Schrumpf Erik
Shah Tejas
Silverberg Mark S
Srivastava Brijesh
Sterneck Martina
Teufel Andreas
Thompson Wesley K
Thomsen Ingo
UK-PSCSC Consortium
van Heel David
Vatn Morten H
Vermeire Severine
Weersma Rinse K
Weismüller Tobias J
Wijmenga Cisca
Winkelmann Juliane
Publication venue: Nat Genet
Publication date: 01/06/2013
Field of study

Apollo (Cambridge)