Search CORE

6 research outputs found

Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

Author: A Bairoch
A Denise
AC Camproux
Anne-Claude Camproux
AP Godbole
B Prum
C Gautier
DA Benson
DL Antzoulakos
E Rocha
G Churchill
G Nuel
G Nuel
G Nuel
G Nuel
G Nuel
G Nuelg
G Reinert
G Reinert
GD Stormo
Gregory Nuel
J Becq
J Do
J Fu
J Kleffe
J Martin
J Van Helden
JAD Aston
JC Fu
JC Fu
JE Hopcroft
JM Claverie
Juliette Martin
JW Fickett
K Liolios
L Regad
Leslie Regad
M Crochemore
M Reignier
M Thomas-Chollier
MC Frith
ME Lladser
MX Geske
MY Leung
N Hulo
P Nicodème
P Nicolas
P Pevzner
P Ribeca
R Cowan
S Karlin
S Sourice
T Erhardsson
V Boeva
V Boeva
V Stefanov
V Stefanov
VT Stefanov
YM Chang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. Results The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Conclusions Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.</p

HAL Evry

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Simultaneous Occurrences of Runs in Independent Markov Chains

Author: C Rouveirol
G Nuel
G Reinert
P Hupe
S. Robin
V. T. Stefanov
VT Stefanov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Waiting for ABRACADABRA. Occurrence of Words and Leading Numbers

Author: D Williams
R Chen
S Robin
S Yen
VT Stefanov
W Feller
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

In this article we introduce the readers to the concept of leading number", as proposed by D. Conway in the seventies of the last century. The leading number, associated to a word w, is a binary vector that describes some special aspects of the structure of w. We shall see that it conveys the essential information that is needed in the analysis of the time of occurrence of w in a random sequence of letters. The theme of time of occurrence of words, a sort of classical topic of applied probability, presents several aspects of interest. In particular, it gives rise to some apparently paradoxical conclusions. Furthermore it is related with the notion of fair games and leads to interesting mathematical problems

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Some Sufficient Conditions for Stochastic Comparisons Between Hitting Times for Skip-free Markov Chains

Author: A Di Crescenzo
A Irle
A Müller
E De Santis
Emilio De Santis
F Ferreira
F Ferreira
Fabio Spizzichino
G Blom
G Blom
JA Fill
JC Fu
K Zhou
M Shaked
P Brémaud
P Diaconis
R Chen
R Szekli
S Robin
SR Li
VT Stefanov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Blocks of chromosomes identical by descent in a population: Models and predictions

Author: AG Clark
F Ball
Frédéric Hospital
H Bickeböller
IR Franklin
J Ødegård
JBS Haldane
K Walters
KP Donnelly
M Kardos
Mathieu Tiret
NH Chapman
OC Martin
P Stam
PF Palamara
Qinghua Shi
RA Fisher
RA Fisher
S Browning
S Carmi
S Karlin
SR Browning
SR Browning
U Knief
VT Stefanov
WG Hill
WG Hill
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Markov binomial distribution of order k and its applications

Author: AM Mood
AN Philippou
AS Demir
AS Demir
C Asano
CA Charalambides
DE Barton
EA Pekoz
FK Hwang
FS Makri
FS Makri
FS Makri
FS Makri
FS Makri
FS Makri
GJ Chang
JC Fu
K Inoue
K Inoue
K. K. Kamalja
KD Ling
KK Kamalja
KK Kamalja
KS Kotwal
M Ebneshahrashoob
MB Rajarshi
MV Koutras
MV Koutras
S Aki
S Aki
S Aki
SA Mingoti
VT Stefanov
W Feller
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref