9 research outputs found

    Even faster elastic-degenerate string matching via fast matrix multiplication

    Get PDF
    An elastic-degenerate (ED) string is a sequence of n sets of strings of total length N, which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention in the combinatorial pattern matching community, and an O(nm1.5 √(log m) + N)-time algorithm is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on this question is that N is substantially larger than both n and m, and thus we would like to have a linear dependency on the former. Under this assumption, the natural open problem is whether we can decrease the 1.5 exponent in the time complexity, similarly as in the related (but, to the best of our knowledge, not equivalent) word break problem [Backurs and Indyk, FOCS 2016].Our starting point is a conditional lower bound for the EDSM problem. We use the popular combinatorial Boolean matrix multiplication (BMM) conjecture stating that there is no truly subcubic combinatorial algorithm for BMM [Abboud and Williams, FOCS 2014]. By designing an appropriate reduction we show that a combinatorial algorithm solving the EDSM problem in O(nm1.5−∊ + N) time, for any ∊ > 0, refutes this conjecture. Of course, the notion of combinatorial algorithms is not clearly defined, so our reduction should be understood as an indication that decreasing the exponent requires fast matrix multiplication.Two standard tools used in algorithms on strings are string periodicity and fast Fourier transform. Our main technical contribution is that we successfully combine these tools with fast matrix multiplication to design a non-combinatorial O(nm1.381 + N)-time algorithm for EDSM. To the best of our knowledge, we are the first to do so.</p

    Comparing Degenerate Strings

    Get PDF
    Uncertain sequences are compact representations of sets of similar strings. They highlight common segments by collapsing them, and explicitly represent varying segments by listing all possible options. A generalized degenerate string (GD string) is a type of uncertain sequence. Formally, a GD string S is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length ki but this length can vary between different sets. We denote by W the sum of these lengths k0, k1,... , kn-1. Our main result is an (N + M)-time algorithm for deciding whether two GD strings of total sizes N and M, respectively, over an integer alphabet, have a non-empty intersection. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in linear space. We then apply our string comparison tool to devise a simple algorithm for computing all palindromes in S in (min{W, n2}N)-time. We complement this upper bound by showing a similar conditional lower bound for computing maximal palindromes in S. We also show that a result, which is essentially the same as our string comparison linear-time algorithm, can be obtained by employing an automata-based approach

    Beyond the BEST Theorem: Fast assessment of Eulerian Trails

    Get PDF
    Given a directed multigraph G= (V, E), with | V| = n nodes and | E| = m edges, and an integer z, we are asked to assess whether the number # ET(G) of node-distinct Eulerian trails of G is at least z; two trails are called node-distinct if their node sequences are different. This problem has been formalized by Bernardini et al. [ALENEX 2020] as it is the core computational problem in several string processing applications. It can be solved in O(nω) arithmetic operations by applying the well-known BEST theorem, where ω&lt; 2.373 denotes the matrix multiplication exponent. The algorithmic challenge is: Can we solve this problem faster for certain values of m and z? Namely, we want to design a combinatorial algorithm for assessing whether # ET(G) ≄ z, which does not resort to the BEST theorem and has a predictably bounded cost as a function of m and z. We address this challenge here by providing a combinatorial algorithm requiring O(m· min { z, # ET(G) }) time

    StringSanitizationTKDD

    No full text
    This repository contains the source code of the approach in 'Combinatorial Algorithms for String Sanitization' that appears in ACM Transactions on Knowledge Discovery from Data (TKDD)

    A Relational Extension of the Notion of Motifs: Application to the Protein Common 3D Substructures Searching Problem

    No full text
    The geometrical configurations of atoms in protein structures can be viewed as approximate relations among them. Then, finding similar common substructures within a set of protein structures belongs to a new class of problems that generalizes that of finding repeated motifs. The novelty lies in the addition of constraints on the motifs in terms of relations that must hold between pairs of positions of the motifs. We will hence denote them as relational motifs. For this class of problems, we present an algorithm that is a suitable extension of the KMR paradigm and, in particular, of the KMRC as it uses a degenerate alphabet. Our algorithm contains several improvements that become especially useful when-as it is required for relational motifs-the inference is made by partially overlapping shorter motifs, rather than concatenating them. The efficiency, correctness and completeness of the algorithm is ensured by several non-trivial properties that are proven in this paper. The algorithm has been applied in the important field of protein common 3D substructure searching. The methods implemented have been tested on several examples of protein families such as serine proteases, globins and cytochromes P450 additionally. The detected motifs have been compared to those found by multiple structural alignments method

    Abstracts of papers presented at the 15th conference of the weed science society of Israel

    No full text
    corecore