Search CORE

49 research outputs found

Efficient String Matching on Coded Texts

Author: Breslauer Dany
Publication venue: 'Aarhus University Library'
Publication date: 14/12/1994
Field of study

The so called "four Russians technique'' is often used to speed up algorithms by encoding several data items in a single memory cell. Given a sequence of n symbols over a constant size alphabet, one can encode the sequence into O(n / lambda) memory cells in O(log(lambda) ) time using n / log(lambda) processors. This paper presents an efficient CRCW-PRAM string-matching algorithm for coded texts that takes O(log log(m/lambda)) time making only O(n / lambda ) operations, an improvement by a factor of lambda = O(log n) on the number of operations used in previous algorithms. Using this string-matching algorithm one can test if a string is square-free and find all palindromes in a string in O(log log n) time using n / log log n processors

Tidsskrift.dk (Det Kongelige Bibliotek)

Optimal Parallel Algorithms for Periods, Palindromes and Squares (Preliminary Version)

Author: Apostolico Alberto
Breslauer Dany
Galil Zvi
Publication venue: 'Purdue University (bepress)'
Publication date: 26/11/1991
Field of study

Purdue E-Pubs

Efficient Computation of Maximal Anti-Exponent in Palindrome-Free Strings

Author: Badkobeh
Badkobeh
Chairungsee
Chalita Toopsuwan
Crochemore
Crochemore
Crochemore
Crochemore
Currie
Dejean
Galil
Golnaz Badkobeh
Grumbach
Gusfield
Knuth
Kolpakov
Lu
Manacher
Manal Mohamed
Maxime Crochemore
Rao
Rozen
Thue
Tsunoda
Warburton
Publication venue: 'Elsevier BV'
Publication date: 24/02/2016
Field of study

A palindrome is a string x = a1 · · · an which is equal to its reversal x = an · · · a1. We consider gapped palindromes which are strings of the form uvu , where u, v are strings, |v| ≥ 2, and u is the reversal of u. Replicating the standard notion of string exponent, we define the anti- exponent of a gapped palindrome uvu as the quotient of |uvu | by |uv|. To get an efficient computation of maximal anti-exponent of factors in a palindrome-free string, we apply techniques based on the suffix au- tomaton and the reversed Lempel-Ziv factorisation. Our algorithm runs in O(n) time on a fixed-size alphabet or O(n log σ) on a large alphabet, which dramatically outperforms the naive cubic-time solution

Goldsmiths Research Online

Crossref

King's Research Portal

Fast Parallel Lyndon Factorization With Applications

Author: Aposiolico Alberto
Crochemore Maxime
Publication venue: 'Purdue University (bepress)'
Publication date: 15/11/1989
Field of study

Purdue E-Pubs

Pattern Matching with Variables: Fast Algorithms and New Hardness Results

Author: Fernau Henning
Manea Florin
Mercas Robert
Schmid Markus L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015)
Publication date: 01/01/2015
Field of study

A pattern (i. e., a string of variables and terminals) maps to a word, if this is obtained by uniformly replacing the variables by terminal words; deciding this is NP-complete. We present efficient algorithmsfootnote{The computational model we use is the standard unit-cost RAM with logarithmic word size. Also, all logarithms appearing in our time complexity evaluations are in base 2.} that solve this problem for restricted classes of patterns. Furthermore, we show that it is NP-complete to decide, for a given number k and a word w, whether w can be factorised into k distinct factors; this shows that the injective version (i.e., different variables are replaced by different words) of the above matching problem is NP-complete even for very restricted cases

Dagstuhl Research Online Publication Server

An Optimal O(log log n) Time Parallel Algorithm for Detecting all Squares in a String

Author
Publication venue: 'Aarhus University Library'
Publication date
Field of study

Crossref

Parallel and scalable combinatorial string algorithms on distributed memory systems

Author: Flick Patrick
Publication venue: Georgia Institute of Technology
Publication date: 29/05/2019
Field of study

Methods for processing and analyzing DNA and genomic data are built upon combinatorial graph and string algorithms. The advent of high-throughput DNA sequencing is enabling the generation of billions of reads per experiment. Classical and sequential algorithms can no longer deal with these growing data sizes - which for the last 10 years have greatly out-paced advances in processor speeds. Processing and analyzing state-of-the-art genomic data sets require the design of scalable and efficient parallel algorithms and the use of large computing clusters. Suffix arrays and trees are fundamental string data structures, which lie at the foundation of many string algorithms, with important applications in text processing, information retrieval, and computational biology. Conversely, the parallel construction of these indices is an actively studied problem. However, prior approaches lacked good worst-case run-time guarantees and exhibit poor scaling and overall performance. In this work, we present our distributed-memory parallel algorithms for indexing large datasets, including algorithms for the distributed construction of suffix arrays, LCP arrays, and suffix trees. We formulate a generalized version of the All-Nearest-Smaller-Values problem, provide an optimal distributed solution, and apply it to the distributed construction of suffix trees - yielding a work-optimal parallel algorithm. Our algorithms for distributed suffix array and suffix tree construction improve the state-of-the-art by simultaneously improving worst-case run-time bounds and achieving superior practical performance. Next, we introduce a novel distributed string index, the Distributed Enhanced Suffix Array (DESA) - based on the suffix and LCP arrays, the DESA consists of these and additional distributed data structures. The DESA is designed to allow efficient pattern search queries in distributed memory while requiring at most O(n/p) memory per process. We present efficient distributed-memory parallel algorithms for querying, as well as for the efficient construction of this distributed index. Finally, we present our work on distributed-memory algorithms for clustering de Bruijn graphs and its application to solving a grand challenge metagenomic dataset.Ph.D

Scholarly Materials And Research @ Georgia Tech

31th International Symposium on Theoretical Aspects of Computer Science: STACS '14, March 5th to March 8th, 2014, Lyon, France

Author: STACS <31 2014, Lyon>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/03/2014
Field of study

Digitale Bibliothek Thüringen

Discovery of Unconventional Patterns for Sequence Analysis: Theory and Algorithms

Author: BATTAGLIA GIOVANNI
Publication venue: 'Pisa University Press'
Publication date: 19/12/2011
Field of study

The biology community is collecting a large amount of raw data, such as the genome sequences of organisms, microarray data, interaction data such as gene-protein interactions, protein-protein interactions, etc. This amount is rapidly increasing and the process of understanding the data is lagging behind the process of acquiring it. An inevitable first step towards making sense of the data is to study their regularities focusing on the non-random structures appearing surprisingly often in the input sequences: patterns. In this thesis we discuss three incarnations of the pattern discovery task, exploring three types of patterns that can model different regularities of the input dataset. While mask patterns have been designed to model short repeated biological sequences, showing a high conservation of their content at some specific positions, permutation patterns have been designed to detect repeated patterns whose parts maintain their physical adjacency but not their ordering in all the pattern occurrences. Transposons, instead, model mobile sequences in the input dataset, which can be discovered by comparing different copies of the same input string, detecting large insertions and deletions in their alignment

Electronic Thesis and Dissertation Archive - Università di Pisa