38 research outputs found

    Multiple seeds sensitivity using a single seed with threshold

    Get PDF
    Spaced seeds are a fundamental tool for similarity search in biosequences. The best sensitivity/selectivity trade-offs are obtained using many seeds simultaneously: This is known as the multiple seed approach. Unfortunately, spaced seeds use a large amount of memory and the available RAM is a practical limit to the number of seeds one can use simultaneously. Inspired by some recent results on lossless seeds, we revisit the approach of using a single spaced seed and considering two regions homologous if the seed hits in at least t sufficiently close positions. We show that by choosing the locations of the don't care symbols in the seed using quadratic residues modulo a prime number, we derive single seeds that when used with a threshold t > 1 have competitive sensitivity/selectivity trade-offs, indeed close to the best multiple seeds known in the literature. In addition, the choice of the threshold t can be adjusted to modify sensitivity and selectivity a posteriori, thus enabling a more accurate search in the specific instance at issue. The seeds we propose also exhibit robustness and allow flexibility in usage

    Space efficient merging of de Bruijn graphs and Wheeler graphs

    Full text link
    The merging of succinct data structures is a well established technique for the space efficient construction of large succinct indexes. In the first part of the paper we propose a new algorithm for merging succinct representations of de Bruijn graphs. Our algorithm has the same asymptotic cost of the state of the art algorithm for the same problem but it uses less than half of its working space. A novel important feature of our algorithm, not found in any of the existing tools, is that it can compute the Variable Order succinct representation of the union graph within the same asymptotic time/space bounds. In the second part of the paper we consider the more general problem of merging succinct representations of Wheeler graphs, a recently introduced graph family which includes as special cases de Bruijn graphs and many other known succinct indexes based on the BWT or one of its variants. We show that Wheeler graphs merging is in general a much more difficult problem, and we provide a space efficient algorithm for the slightly simplified problem of determining whether the union graph has an ordering that satisfies the Wheeler conditions.Comment: 24 pages, 10 figures. arXiv admin note: text overlap with arXiv:1902.0288

    External memory BWT and LCP computation for sequence collections with applications

    Get PDF
    We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We show that our algorithm performs O(n maxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm outperforms the current best algorithm for collections of sequences with different lengths and when the average LCP of the collection is relatively small compared to the length of the sequences. In the second part of the paper, we show that our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP arrays, provide simple, scan based, external memory algorithms for three well known problems in bioinformatics: the computation of the all pairs suffix-prefix overlaps, the computation of maximal repeats, and the construction of succinct de Bruijn graphs

    A Bayesian Network Approach for the Interpretation of Cyber Attacks to Power Systems

    Get PDF
    The focus of this paper is on the analysis of the cyber security resilience of digital infrastructures deployed by power grids, internationally recognized as a priority since several recent cyber attacks targeted energy systems and in particular the power service. In response to the regulatory framework, this paper presents an analysis approach based on the Bayesian Networks formalism and on real world threat scenarios. Our approach enables analyses oriented to planning of security measures and monitoring, and to forecasting of adversarial behaviours

    Analisi e rilevamento intelligente di processi di attacco alle Smart-Grid

    Get PDF
    Proponiamo una metodologia basata sulle Reti Bayesiane come strumento di supporto all’analisi della sicurezza di Smart Grid, ed in particolare per la previsione di intrusioni e attività ostili

    A Quantifier Elimination For The Theory Of p-Adic Numbers

    No full text
    This paper presents a detailed analysis of a quantifier elimination algorithm for the first order theory of p-adic numbers based on a p-adic analogue of the cylindric algebraic decomposition. It is believed that such method should lead to an elementary upper bound for the theory

    Better Spaced Seeds Using Quadratic Residues

    No full text
    Spaced seeds are used in approximate pattern matching algorithms to quickly discard regions where a match is not likely to occur. We propose a family of lossless spaced seeds based on Quadratic Residues modulo a prime number. Our seeds work with a threshold t>1 in the sense that two regions are considered similar only if the seed hits t times within the regions. We prove that, for any number of errors, our seeds have an exponentially smaller probability of producing false positive matches than any traditional seed using a threshold t=1. To establish our result we introduce a formal notion of selectivity that generalizes the concept of seed weight, and we relate it to the minimum coverage and to a new structural property defined in terms on seed rotations. This groundwork will be useful for further analysis on seeds with threshold and we use it to provide improved bounds for approximate matching with 2 or 3 errors. Our results show that the use of a single seed with a threshold t>1 should be considered as a possible alternative to single or multiple seeds with t=1
    corecore