1,223 research outputs found

    Revisiting Waiting Times in DNA evolution

    Full text link
    Transcription factors are short stretches of DNA (or kk-mers) mainly located in promoters sequences that enhance or repress gene expression. With respect to an initial distribution of letters on the DNA alphabet, Behrens and Vingron consider a random sequence of length nn that does not contain a given kk-mer or word of size kk. Under an evolution model of the DNA, they compute the probability pn\mathfrak{p}_n that this kk-mer appears after a unit time of 20 years. They prove that the waiting time for the first apparition of the kk-mer is well approximated by Tn=1/pnT_n=1/\mathfrak{p}_n. Their work relies on the simplifying assumption that the kk-mer is not self-overlapping. They observe in particular that the waiting time is mostly driven by the initial distribution of letters. Behrens et al. use an approach by automata that relaxes the assumption related to words overlaps. Their numerical evaluations confirms the validity of Behrens and Vingron approach for non self-overlapping words, but provides up to 44% corrections for highly self-overlapping words such as AAAAA\mathtt{AAAAA}. We devised an approach of the problem by clump analysis and generating functions; this approach leads to prove a quasi-linear behaviour of pn\mathfrak{p}_n for a large range of values of nn, an important result for DNA evolution. We present here this clump analysis, first by language decomposition, and next by an automaton construction; finally, we describe an equivalent approach by construction of Markov automata.Comment: 19 pages, 3 Figures, 2 Table

    A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

    Get PDF
    Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017

    A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

    Get PDF
    Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017

    Computing with cells: membrane systems - some complexity issues.

    Full text link
    Membrane computing is a branch of natural computing which abstracts computing models from the structure and the functioning of the living cell. The main ingredients of membrane systems, called P systems, are (i) the membrane structure, which consists of a hierarchical arrangements of membranes which delimit compartments where (ii) multisets of symbols, called objects, evolve according to (iii) sets of rules which are localised and associated with compartments. By using the rules in a nondeterministic/deterministic maximally parallel manner, transitions between the system configurations can be obtained. A sequence of transitions is a computation of how the system is evolving. Various ways of controlling the transfer of objects from one membrane to another and applying the rules, as well as possibilities to dissolve, divide or create membranes have been studied. Membrane systems have a great potential for implementing massively concurrent systems in an efficient way that would allow us to solve currently intractable problems once future biotechnology gives way to a practical bio-realization. In this paper we survey some interesting and fundamental complexity issues such as universality vs. nonuniversality, determinism vs. nondeterminism, membrane and alphabet size hierarchies, characterizations of context-sensitive languages and other language classes and various notions of parallelism
    • 

    corecore