Search CORE

142 research outputs found

Subsequence Automata with Default Transitions

Author: Bille Philip
Gørtz Inge Li
Skjoldjensen Frederik Rye
Publication venue
Publication date: 01/01/2016
Field of study

Let

S

be a string of length

n

with characters from an alphabet of size

\sigma

. The \emph{subsequence automaton} of

S

(often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of

S

. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is

O(n\sigma)

and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter

k

1 < k \leq \sigma

, we present a subsequence automaton with default transitions of size

O(nk\log_{k}\sigma)

and delay

O(\log_k \sigma)

. Hence, with

k = 2

we obtain an automaton of size

O(n \log \sigma)

and delay

O(\log \sigma)

. On the other extreme, with

k = \sigma

, we obtain an automaton of size

O(n \sigma)

and delay

O(1)

, thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

arXiv.org e-Print Archive

Online Research Database In Technology

Compressed Subsequence Matching and Packed Tree Coloring

Author: A. Tiskin
A. Tiskin
D.D. Sleator
G. Das
H. Mannila
J. Ziv
J. Ziv
M. Charikar
M. Crochemore
M. Thorup
M.A. Bender
M.L. Fredman
N.J. Larsson
O. Berkman
P. Cégielski
P. Cégielski
P. Ferragina
P.F. Dietz
R.A. Baeza-Yates
S. Abiteboul
S. Alstrup
S. Alstrup
S. Alstrup
T. Yamamoto
W. Rytter
Z. Troníček
Publication venue
Publication date: 01/01/2014
Field of study

We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size

n

compressing a string of size

N

and a pattern string of size

m

over an alphabet of size

\sigma

, our algorithm uses

O(n+\frac{n\sigma}{w})

space and

O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ)

O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ)

time. Here

w

is the word size and

occ

is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for

occ=o(\frac{n}{\log N})

occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online Research Database In Technology

Faster subsequence recognition in compressed strings

Author: A Tiskin
A Tiskin
A. Tiskin
BW Watson
CER Alves
G Myers
G Navarro
G Ziv
G Ziv
J Kärkkäinen
JL Bentley
M Crochemore
P Cégielski
TA Welch
W Rytter
WJ Masek
Publication venue
Publication date: 18/01/2008
Field of study

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length

\bar m

, and an uncompressed pattern of length

n

, C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time

O(\bar mn^2 \log n)

. We improve the running time to

O(\bar mn^{1.5})

. Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time

O(\bar mn^{1.5})

; the same problem with a compressed pattern is known to be NP-hard

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Discovering unbounded episodes in sequential data

Author: Casas Garriga Gemma
Publication venue
Publication date: 01/01/2003
Field of study

One basic goal in the analysis of time-series data is to find frequent interesting episodes, i.e, collections of events occurring frequently together in the input sequence. Most widely-known work decide the interestingness of an episode from a fixed user-specified window width or interval, that bounds the subsequent sequential association rules. We present in this paper, a more intuitive definition that allows, in turn, interesting episodes to grow during the mining without any user-specified help. A convenient algorithm to efficiently discover the proposed unbounded episodes is also implemented. Experimental results confirm that our approach results useful and advantageous.Postprint (published version

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Compact Recognizers of Episode Sequences

Author
Publication venue
Publication date: 01/01/1997
Field of study

Abstract Mikhail J. Atallah t Purdue University Given two strings T = at ... an and P = hI .. .h m over an alphabet E, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple D(nlog lEI) time preprocessing ofT makes it easy to decide subsequently for any P and in at most IPJIog lEI character comparisons, whether P is a subsequence of T. These problems become more complicated if onc asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P, it is meant that P is not a subsequence of any proper substring of Y. For every minimal substring Y, the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y. It is not difficult to realize such an automaton in time and space 0(n 2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(nlogn), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried out in time O(n + k l rocc, . i . log n . log I~I), where rocc, is the number of distinct minimal substrings of T having b 1 ... b; as a subsequence. All log factors appearing in the above bounds can be further reduced to log log by resort to known integer-handling data structures. Index Terms -Algorithms, pattern matching, subsequence and episode searching, DAWG, suffix automaton, compact subsequence automaton, skip-edge DAWG, forward failure function, skip-link

CiteSeerX

Bidirectional Growth based Mining and Cyclic Behaviour Analysis of Web Sequential Patterns

Author: Kumar N. Krishna
Patnaik L. M.
Srikantaiah K. C.
Venugopal K. R.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/01/2013
Field of study

Web sequential patterns are important for analyzing and understanding users behaviour to improve the quality of service offered by the World Wide Web. Web Prefetching is one such technique that utilizes prefetching rules derived through Cyclic Model Analysis of the mined Web sequential patterns. The more accurate the prediction and more satisfying the results of prefetching if we use a highly efficient and scalable mining technique such as the Bidirectional Growth based Directed Acyclic Graph. In this paper, we propose a novel algorithm called Bidirectional Growth based mining Cyclic behavior Analysis of web sequential Patterns (BGCAP) that effectively combines these strategies to generate prefetching rules in the form of 2-sequence patterns with Periodicity and threshold of Cyclic Behaviour that can be utilized to effectively prefetch Web pages, thus reducing the users perceived latency. As BGCAP is based on Bidirectional pattern growth, it performs only (log n+1) levels of recursion for mining n Web sequential patterns. Our experimental results show that prefetching rules generated using BGCAP is 5-10 percent faster for different data sizes and 10-15% faster for a fixed data size than TD-Mine. In addition, BGCAP generates about 5-15 percent more prefetching rules than TD-Mine.Comment: 19 page

arXiv.org e-Print Archive

ePrints@Bangalore University

Selected Topics in Network Optimization: Aligning Binary Decision Diagrams for a Facility Location Problem and a Search Method for Dynamic Shortest Path Interdiction

Author: Bochkarev Alexey
Publication venue: Clemson University Libraries
Publication date: 01/12/2021
Field of study

This work deals with three different combinatorial optimization problems: minimizing the total size of a pair of binary decision diagrams (BDDs) under a certain structural property, a variant of the facility location problem, and a dynamic version of the Shortest-Path Interdiction (DSPI) problem. However, these problems all have the following core idea in common: They all stem from representing an optimization problem as a decision diagram. We begin from cases in which such a diagram representation of reasonable size might exist, but finding a small diagram is difficult to achieve. The first problem develops a heuristic for enforcing a structural property for a collection of BDDs, which allows them to be merged into a single one efficiently. In the second problem, we consider a specific combinatorial problem that allows for a natural representation by a pair of BDDs. We use the previous result and ideas developed earlier in the literature to reformulate this problem as a linear program over a single BDD. This approach enables us to obtain sensitivity information, while often enjoying runtimes comparable to a mixed integer program solved with a commercial solver, after we pay the computational overhead of building the diagram (e.g., when re-solving the problem using different costs, but the same graph topology). In the last part, we examine DSPI, for which building the full decision diagram is generally impractical. We formalize the concept of a game tree for the DSPI and design a heuristic based on the idea of building only selected parts of this exponentially-sized decision diagram (which is not binary any more). We use a Monte Carlo Tree Search framework to establish policies that are near optimal. To mitigate the size of the game tree, we leverage previously derived bounds for the DSPI and employ an alpha–beta pruning technique for minimax optimization. We highlight the practicality of these ideas in a series of numerical experiments

Clemson University: TigerPrints

Are there any good digraph width measures?

Author: Ageev
Arnborg
Barát
Berwanger
Berwanger
Bodlaender
Brooks
Chen
Courcelle
Courcelle
Courcelle
Courcelle
Daniel Meister
Diestel
Downey
Dvořák
Ebbinghaus
Fortune
Fraysseix
Ganian
Ganian
Ganian
Ganian
Ganian
Garey
Hliněný
Hodges
Hunter
Hunter
Jan Obdržálek
Joachim Kneis
Johnson
Kanté
Kanté
Karp
Kim
Kintali
Kreutzer
Kreutzer
Lampis
Mader
Makowsky
Obdržálek
Oum
Peter Rossmanith
Petr Hliněný
Rabin
Robert Ganian
Robertson
Robertson
Robertson
Safari
Seymour
Slivkins
Somnath Sikdar
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Many width measures for directed graphs have been proposed in the last few years in pursuit of generalizing (the notion of) treewidth to directed graphs. However, none of these measures possesses, at the same time, the major properties of treewidth, namely, 1. being algorithmically useful , that is, admitting polynomial-time algorithms for a large class of problems on digraphs of bounded width (e.g. the problems definable in MSO1MSO1); 2. having nice structural properties such as being (at least nearly) monotone under taking subdigraphs and some form of arc contractions (property closely related to characterizability by particular cops-and-robber games). We investigate the question whether the search for directed treewidth counterparts has been unsuccessful by accident, or whether it has been doomed to fail from the beginning. Our main result states that any reasonable width measure for directed graphs which satisfies the two properties above must necessarily be similar to treewidth of the underlying undirected graph

Crossref

Univerzitní repozitář Masarykovy univerzity

Publikationsserver der RWTH Aachen University