837 research outputs found
Languages of lossless seeds
Several algorithms for similarity search employ seeding techniques to quickly
discard very dissimilar regions. In this paper, we study theoretical properties
of lossless seeds, i.e., spaced seeds having full sensitivity. We prove that
lossless seeds coincide with languages of certain sofic subshifts, hence they
can be recognized by finite automata. Moreover, we show that these subshifts
are fully given by the number of allowed errors k and the seed margin l. We
also show that for a fixed k, optimal seeds must asymptotically satisfy l ~
m^(k/(k+1)).Comment: In Proceedings AFL 2014, arXiv:1405.527
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Spaced seeds have been recently shown to not only detect more alignments, but
also to give a more accurate measure of phylogenetic distances (Boden et al.,
2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower
misclassification rate when used with Support Vector Machines (SVMs) (On-odera
and Shibuya, 2013), We confirm by independent experiments these two results,
and propose in this article to use a coverage criterion (Benson and Mak, 2008,
Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both
cases in order to design better seed patterns. We show first how this coverage
criterion can be directly measured by a full automaton-based approach. We then
illustrate how this criterion performs when compared with two other criteria
frequently used, namely the single-hit and multiple-hit criteria, through
correlation coefficients with the correct classification/the true distance. At
the end, for alignment-free distances, we propose an extension by adopting the
coverage criterion, show how it performs, and indicate how it can be
efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017
SAN models of a benchmark on dynamic reliability
This report provides the detailed description of the Stochastic Activity Network (SAN) models appearing in [1] and concerning a benchmark on dynamic reliability taken from the literature
- …