33 research outputs found

    Repetition Detection in a Dynamic String

    Get PDF
    A string UU for a non-empty string U is called a square. Squares have been well-studied both from a combinatorial and an algorithmic perspective. In this paper, we are the first to consider the problem of maintaining a representation of the squares in a dynamic string S of length at most n. We present an algorithm that updates this representation in n^o(1) time. This representation allows us to report a longest square-substring of S in O(1) time and all square-substrings of S in O(output) time. We achieve this by introducing a novel tool - maintaining prefix-suffix matches of two dynamic strings. We extend the above result to address the problem of maintaining a representation of all runs (maximal repetitions) of the string. Runs are known to capture the periodic structure of a string, and, as an application, we show that our representation of runs allows us to efficiently answer periodicity queries for substrings of a dynamic string. These queries have proven useful in static pattern matching problems and our techniques have the potential of offering solutions to these problems in a dynamic text setting

    Longest common substring made fully dynamic

    Get PDF
    Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to this problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in Õ(n2/3) time, after Õ(n)-time and space preprocessing. 1 This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, they presented an Õ(n)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in Õ(1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings. We show that the techniques we develop can be applied to obtain fully dynamic algorithms for all of these variants. The only previously known sublinear-time dynamic algorithms for problems on strings were for maintaining a dynamic collection of strings for comparison queries and for pattern matching, with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018]. As an intermediate problem we consider computing the solution for a string with a given set of k edits, which leads us, in particular, to answering internal queries on a string. The input to such a query is specified by a substring (or substrings) of a given string. Data structures for answering internal string queries that were proposed by Kociumaka et al. [SODA 2015] and by Gagie et al. [CCCG 2013] are used, along with new ones, based on ingredients such as the suffix tree, heavy-path decomposition, orthogonal range queries, difference covers, and string periodicity

    Dynamic and Internal Longest Common Substring

    Get PDF
    Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n) -time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in O~ (n2 / 3) time, after O~ (n) -time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, the authors presented an O~ (n) -sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in O~ (1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings; specifically, computing the longest palindrome and the Lyndon factorization of a string after a single edit operation. We develop dynamic sublinear-time algorithms for both of these problems as well. We also consider internal LCS queries, that is, queries in which we are to return an LCS of a pair of substrings of S and T. We show that answering such queries is hard in general and propose efficient data structures for several restricted cases

    Topics in combinatorial pattern matching

    Get PDF

    28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland

    Get PDF
    Peer reviewe

    Dynamic Longest Common Substring in Polylogarithmic Time

    Get PDF
    The longest common substring problem consists in finding a longest string that appears as a (contiguous) substring of two input strings. We consider the dynamic variant of this problem, in which we are to maintain two dynamic strings S and T, each of length at most n, that undergo substitutions of letters, in order to be able to return a longest common substring after each substitution. Recently, Amir et al. [ESA 2019] presented a solution for this problem that needs only ??(n^(2/3)) time per update. This brought the challenge of determining whether there exists a faster solution with polylogarithmic update time, or (as is the case for other dynamic problems), we should expect a polynomial (conditional) lower bound. We answer this question by designing a significantly faster algorithm that processes each substitution in amortized log^?(1) n time with high probability. Our solution relies on exploiting the local consistency of the parsing of a collection of dynamic strings due to Gawrychowski et al. [SODA 2018], and on maintaining two dynamic trees with labeled bicolored leaves, so that after each update we can report a pair of nodes, one from each tree, of maximum combined weight, which have at least one common leaf-descendant of each color. We complement this with a lower bound of ?(log n/ log log n) for the update time of any polynomial-size data structure that maintains the LCS of two dynamic strings, even allowing amortization and randomization

    Approximate String Matching With Dynamic Programming and Suffix Trees

    Get PDF
    The importance and the contribution of string matching algorithms to the modern society cannot be overstated. From basic search algorithms such as spell checking and data querying, to advanced algorithms such as DNA sequencing, trend analysis and signal processing, string matching algorithms form the foundation of many aspects in computing that have been pivotal in technological advancement. In general, string matching algorithms can be divided into the categories of exact string matching and approximate string matching. We study each area and examine some of the well known algorithms. We probe into one of the most intriguing data structure in string algorithms, the suffix tree. The lowest common ancestor extension of the suffix tree is the key to many advanced string matching algorithms. With these tools, we are able to solve string problems that were, until recently, thought intractable by many. Another interesting and relatively new data structure in string algorithms is the suffix array, which has significant breakthroughs in its linear time construction in recent years. Primarily, this thesis focuses on approximate string matching using dynamic programming and hybrid dynamic programming with suffix tree. We study both approaches in detail and see how the merger of exact string matching and approximate string matching algorithms can yield synergistic results in our experiments

    ANÁLISE E APLICAÇÃO DE ESTRUTURAS DE SUFIXOS NA RESOLUÇÃO DO STRING MATCHING

    Get PDF
    String Matching Ă© o problema que busca responder a seguinte pergunta: “É possĂ­vel encontrar determinado padrĂŁo dentro de um texto?”. É um problema amplamente estudado na CiĂȘncia da Computação e tambĂ©m na Biologia Computacional, devido Ă  existĂȘncia de suas diferentes modificaçÔes em ferramentas de pesquisa e tambĂ©m no processamento de cadeias de DNA. JĂĄ existem algoritmos que alcançaram a solução Ăłtima para responder a pergunta do problema, entretanto tais soluçÔes nĂŁo possuem a mesma eficiĂȘncia nas extensĂ”es e variaçÔes do problema. Dessa forma, diversas pesquisas tem estudado estruturas de dados relativas aos sufixos do texto para alcançar soluçÔes que sejam capazes de resolver variaçÔes complexas do string matching. O presente trabalho realiza um estudo e anĂĄlise aprofundada sobre a eficiĂȘncia de dessas estruturas: a ĂĄrvore de sufixos e o autĂŽmato de sufixos. Algoritmos clĂĄssicos tambĂ©m sĂŁo abordados e comparados Ă s estruturas enquanto o trabalho Ă© discorrido. As anĂĄlises seguem critĂ©rios estatĂ­sticos, tempos de execução e complexidade de algoritmos para obter maior grau de confiança nos resultados

    ANÁLISE E APLICAÇÃO DE ESTRUTURAS DE SUFIXOS NA RESOLUÇÃO DO STRING MATCHING

    Get PDF
    String Matching Ă© o problema que busca responder a seguinte pergunta: “É possĂ­vel encontrar determinado padrĂŁo dentro de um texto?”. É um problema amplamente estudado na CiĂȘncia da Computação e tambĂ©m na Biologia Computacional, devido Ă  existĂȘncia de suas diferentes modificaçÔes em ferramentas de pesquisa e tambĂ©m no processamento de cadeias de DNA. JĂĄ existem algoritmos que alcançaram a solução Ăłtima para responder a pergunta do problema, entretanto tais soluçÔes nĂŁo possuem a mesma eficiĂȘncia nas extensĂ”es e variaçÔes do problema. Dessa forma, diversas pesquisas tem estudado estruturas de dados relativas aos sufixos do texto para alcançar soluçÔes que sejam capazes de resolver variaçÔes complexas do string matching. O presente trabalho realiza um estudo e anĂĄlise aprofundada sobre a eficiĂȘncia de dessas estruturas: a ĂĄrvore de sufixos e o autĂŽmato de sufixos. Algoritmos clĂĄssicos tambĂ©m sĂŁo abordados e comparados Ă s estruturas enquanto o trabalho Ă© discorrido. As anĂĄlises seguem critĂ©rios estatĂ­sticos, tempos de execução e complexidade de algoritmos para obter maior grau de confiança nos resultados
    corecore