1,223 research outputs found
Revisiting Waiting Times in DNA evolution
Transcription factors are short stretches of DNA (or -mers) mainly located
in promoters sequences that enhance or repress gene expression. With respect to
an initial distribution of letters on the DNA alphabet, Behrens and Vingron
consider a random sequence of length that does not contain a given -mer
or word of size . Under an evolution model of the DNA, they compute the
probability that this -mer appears after a unit time of 20
years. They prove that the waiting time for the first apparition of the -mer
is well approximated by . Their work relies on the
simplifying assumption that the -mer is not self-overlapping. They observe
in particular that the waiting time is mostly driven by the initial
distribution of letters.
Behrens et al. use an approach by automata that relaxes the assumption
related to words overlaps. Their numerical evaluations confirms the validity of
Behrens and Vingron approach for non self-overlapping words, but provides up to
44% corrections for highly self-overlapping words such as . We
devised an approach of the problem by clump analysis and generating functions;
this approach leads to prove a quasi-linear behaviour of for a
large range of values of , an important result for DNA evolution. We present
here this clump analysis, first by language decomposition, and next by an
automaton construction; finally, we describe an equivalent approach by
construction of Markov automata.Comment: 19 pages, 3 Figures, 2 Table
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Spaced seeds have been recently shown to not only detect more alignments, but
also to give a more accurate measure of phylogenetic distances (Boden et al.,
2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower
misclassification rate when used with Support Vector Machines (SVMs) (On-odera
and Shibuya, 2013), We confirm by independent experiments these two results,
and propose in this article to use a coverage criterion (Benson and Mak, 2008,
Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both
cases in order to design better seed patterns. We show first how this coverage
criterion can be directly measured by a full automaton-based approach. We then
illustrate how this criterion performs when compared with two other criteria
frequently used, namely the single-hit and multiple-hit criteria, through
correlation coefficients with the correct classification/the true distance. At
the end, for alignment-free distances, we propose an extension by adopting the
coverage criterion, show how it performs, and indicate how it can be
efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Spaced seeds have been recently shown to not only detect more alignments, but
also to give a more accurate measure of phylogenetic distances (Boden et al.,
2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower
misclassification rate when used with Support Vector Machines (SVMs) (On-odera
and Shibuya, 2013), We confirm by independent experiments these two results,
and propose in this article to use a coverage criterion (Benson and Mak, 2008,
Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both
cases in order to design better seed patterns. We show first how this coverage
criterion can be directly measured by a full automaton-based approach. We then
illustrate how this criterion performs when compared with two other criteria
frequently used, namely the single-hit and multiple-hit criteria, through
correlation coefficients with the correct classification/the true distance. At
the end, for alignment-free distances, we propose an extension by adopting the
coverage criterion, show how it performs, and indicate how it can be
efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017
Computing with cells: membrane systems - some complexity issues.
Membrane computing is a branch of natural computing which abstracts computing models from the structure and the functioning of the living cell. The main ingredients of membrane systems, called P systems, are (i) the membrane structure, which consists of a hierarchical arrangements of membranes which delimit compartments where (ii) multisets of symbols, called objects, evolve according to (iii) sets of rules which are localised and associated with compartments. By using the rules in a nondeterministic/deterministic maximally parallel manner, transitions between the system configurations can be obtained. A sequence of transitions is a computation of how the system is evolving. Various ways of controlling the transfer of objects from one membrane to another and applying the rules, as well as possibilities to dissolve, divide or create membranes have been studied. Membrane systems have a great potential for implementing massively concurrent systems in an efficient way that would allow us to solve currently intractable problems once future biotechnology gives way to a practical bio-realization. In this paper we survey some interesting and fundamental complexity issues such as universality vs. nonuniversality, determinism vs. nondeterminism, membrane and alphabet size hierarchies, characterizations of context-sensitive languages and other language classes and various notions of parallelism
- âŠ