38 research outputs found
Multiple seeds sensitivity using a single seed with threshold
Spaced seeds are a fundamental tool for similarity search in biosequences. The best sensitivity/selectivity trade-offs are obtained using many seeds simultaneously: This is known as the multiple seed approach. Unfortunately, spaced seeds use a large amount of memory and the available RAM is a practical limit to the number of seeds one can use simultaneously. Inspired by some recent results on lossless seeds, we revisit the approach of using a single spaced seed and considering two regions homologous if the seed hits in at least t sufficiently close positions. We show that by choosing the locations of the don't care symbols in the seed using quadratic residues modulo a prime number, we derive single seeds that when used with a threshold t > 1 have competitive sensitivity/selectivity trade-offs, indeed close to the best multiple seeds known in the literature. In addition, the choice of the threshold t can be adjusted to modify sensitivity and selectivity a posteriori, thus enabling a more accurate search in the specific instance at issue. The seeds we propose also exhibit robustness and allow flexibility in usage
Space efficient merging of de Bruijn graphs and Wheeler graphs
The merging of succinct data structures is a well established technique for
the space efficient construction of large succinct indexes. In the first part
of the paper we propose a new algorithm for merging succinct representations of
de Bruijn graphs. Our algorithm has the same asymptotic cost of the state of
the art algorithm for the same problem but it uses less than half of its
working space. A novel important feature of our algorithm, not found in any of
the existing tools, is that it can compute the Variable Order succinct
representation of the union graph within the same asymptotic time/space bounds.
In the second part of the paper we consider the more general problem of merging
succinct representations of Wheeler graphs, a recently introduced graph family
which includes as special cases de Bruijn graphs and many other known succinct
indexes based on the BWT or one of its variants. We show that Wheeler graphs
merging is in general a much more difficult problem, and we provide a space
efficient algorithm for the slightly simplified problem of determining whether
the union graph has an ordering that satisfies the Wheeler conditions.Comment: 24 pages, 10 figures. arXiv admin note: text overlap with
arXiv:1902.0288
External memory BWT and LCP computation for sequence collections with applications
We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We show that our algorithm performs O(n maxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm outperforms the current best algorithm for collections of sequences with different lengths and when the average LCP of the collection is relatively small compared to the length of the sequences.
In the second part of the paper, we show that our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP arrays, provide simple, scan based, external memory algorithms for three well known problems in bioinformatics: the computation of the all pairs suffix-prefix overlaps, the computation of maximal repeats, and the construction of succinct de Bruijn graphs
A Bayesian Network Approach for the Interpretation of Cyber Attacks to Power Systems
The focus of this paper is on the analysis of the cyber security
resilience of digital infrastructures deployed by power grids, internationally recognized as a priority since several recent cyber attacks targeted
energy systems and in particular the power service. In response to the
regulatory framework, this paper presents an analysis approach based
on the Bayesian Networks formalism and on real world threat scenarios.
Our approach enables analyses oriented to planning of security measures
and monitoring, and to forecasting of adversarial behaviours
Analisi e rilevamento intelligente di processi di attacco alle Smart-Grid
Proponiamo una metodologia basata sulle Reti Bayesiane come strumento di supporto all’analisi della sicurezza di Smart Grid, ed in particolare per la previsione di intrusioni e attività ostili
A Quantifier Elimination For The Theory Of p-Adic Numbers
This paper presents a detailed analysis of a quantifier elimination algorithm for the first order theory of p-adic numbers based on a p-adic analogue of the cylindric algebraic decomposition. It is believed that such method should lead to an elementary upper bound for the theory
Better Spaced Seeds Using Quadratic Residues
Spaced seeds are used in approximate pattern matching algorithms to quickly discard regions where a match is not likely to occur. We propose a family of lossless spaced seeds based on Quadratic Residues modulo a prime number. Our seeds work with a threshold t>1 in the sense that two regions are considered similar only if the seed hits t times within the regions. We prove that, for any number of errors, our seeds have an exponentially smaller probability of producing false positive matches than any traditional seed using a threshold t=1. To establish our result we introduce a formal notion of selectivity that generalizes the concept of seed weight, and we relate it to the minimum coverage and to a new structural property defined in terms on seed rotations. This groundwork will be useful for further analysis on seeds with threshold and we use it to provide improved bounds for approximate matching with 2 or 3 errors. Our results show that the use of a single seed with a threshold t>1 should be considered as a possible alternative to single or multiple seeds with t=1