Search CORE

124 research outputs found

Longest common substring made fully dynamic

Author: Amir A. (Amihood)
Charalampopoulos P. (Panagiotis)
Pissis S. (Solon)
Radoszewski J. (Jakub)
Publication venue
Publication date: 16/07/2018
Field of study

Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to this problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in Õ(n2/3) time, after Õ(n)-time and space preprocessing. 1 This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, they presented an Õ(n)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in Õ(1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings. We show that the techniques we develop can be applied to obtain fully dynamic algorithms for all of these variants. The only previously known sublinear-time dynamic algorithms for problems on strings were for maintaining a dynamic collection of strings for comparison queries and for pattern matching, with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018]. As an intermediate problem we consider computing the solution for a string with a given set of k edits, which leads us, in particular, to answering internal queries on a string. The input to such a query is specified by a substring (or substrings) of a given string. Data structures for answering internal string queries that were proposed by Kociumaka et al. [SODA 2015] and by Gagie et al. [CCCG 2013] are used, along with new ones, based on ingredients such as the suffix tree, heavy-path decomposition, orthogonal range queries, difference covers, and string periodicity

arXiv.org e-Print Archive

CWI's Institutional Repository

Dagstuhl Research Online Publication Server

Longest property-preserved common factor

Author: D Belazzougui
D Gusfield
H Bannai
J-P Duval
L Chi
M Dumitran
M Farach
M Federico
M Lothaire
P Peterlongo
P Peterlongo
S Inenaga
SR Chowdhury
SV Thankachan
SV Thankachan
SW Bae
T Kociumaka
T Starikovskaya
WI Chang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

In this paper we introduce a new family of string processing problems. We are given two or more strings and we are asked to compute a factor common to all strings that preserves a specific property and has maximal length. Here we consider two fundamental string properties: square-free factors and periodic factors under two different settings, one per property. In the first setting, we are given a string x and we are asked to construct a data structure over x answering the following type of on-line queries: given string y, find a longest square-free factor common to x and y. In the second setting, we are given k strings and an integer 1 < k’ ≤ k and we are asked to find a longest periodic factor common to at least k’ strings. We present linear-time solutions for both settings. We anticipate that our paradigm can be extended to other string properties

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

Crossref

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

King's Research Portal

Longest Unbordered Factor in Quasilinear Time

Author: Kociumaka Tomasz
Kundu Ritu
Mohamed Manal
Pissis Solon P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th International Symposium on Algorithms and Computation (ISAAC 2018)
Publication date: 01/01/2018
Field of study

A border u of a word w is a proper factor of w occurring both as a prefix and as a suffix. The maximal unbordered factor of w is the longest factor of w which does not have a border. Here an O(n log n)-time with high probability (or O(n log n log^2 log n)-time deterministic) algorithm to compute the Longest Unbordered Factor Array of w for general alphabets is presented, where n is the length of w. This array specifies the length of the maximal unbordered factor starting at each position of w. This is a major improvement on the running time of the currently best worst-case algorithm working in O(n^{1.5}) time for integer alphabets [Gawrychowski et al., 2015]

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Longest Property-Preserved Common Factor

Author: Ayad Lorraine,
Bernardini Giulia
Grossi Roberto,
Iliopoulos Costas,
Pisanti Nadia
Pissis Solon,
Rosone Giovanna
Publication venue: HAL CCSD
Publication date: 01/01/2018
Field of study

International audienceIn this paper we introduce a new family of string processing problems. We are given two or more strings and we are asked to compute a factor common to all strings that preserves a specific property and has maximal length. Here we consider three fundamental string properties: square-free factors, periodic factors, and palindromic factors under three different settings, one per property. In the first setting, we are given a string x and we are asked to construct a data structure over x answering the following type of on-line queries: given string y, find a longest square-free factor common to x and y. In the second setting, we are given k strings and an integer 1 < k ≤ k and we are asked to find a longest periodic factor common to at least k strings. In the third setting, we are given two strings and we are asked to find a longest palindromic factor common to the two strings. We present linear-time solutions for all settings. We anticipate that our paradigm can be extended to other string properties or settings

INRIA a CCSD electronic archive server

Efficient Data Structures for Text Processing Applications

Author: Abedin Paniz
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/12/2021
Field of study

This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern\u27s occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)