Search CORE

764 research outputs found

A Minimal Periods Algorithm with Applications

Author: A. Apostolico
A.O. Slisenko
A.S. Fraenkel
B. Schieber
D. Beauquier
D. Gusfield
D. Gusfield
D. Harel
D. Knuth
E.M. McCreight
J. Duval
J. Stoye
L. Ilie
M. Crochemore
M. Crochemore
M. Crochemore
M. Main
M. Main
M.G. Main
R. Kolpakov
S.R. Kosaraju
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/11/2009
Field of study

Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent

k\geq2

and integer

s\geq0

. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The stochastic matching problem

Author: A. Braunstein
A. Prekopa
A. Ramezanpour
C. Papadimitriou
D. Gusfield
D. Shah
D. P. Bertsekas
E. Trucco
F. Altarelli
J. Birge
L. Lovasz
R. Zecchina
R. J. Baxter
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2011
Field of study

The matching problem plays a basic role in combinatorial optimization and in statistical mechanics. In its stochastic variants, optimization decisions have to be taken given only some probabilistic information about the instance. While the deterministic case can be solved in polynomial time, stochastic variants are worst-case intractable. We propose an efficient method to solve stochastic matching problems which combines some features of the survey propagation equations and of the cavity method. We test it on random bipartite graphs, for which we analyze the phase diagram and compare the results with exact bounds. Our approach is shown numerically to be effective on the full range of parameters, and to outperform state-of-the-art methods. Finally we discuss how the method can be generalized to other problems of optimization under uncertainty.Comment: Published version has very minor change

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Wave Energy: a Pacific Perspective

Author: D. Gusfield
D. Okanohara
G. Manzini
J. Fischer
J. Kärkkäinen
J. Kärkkäinen
K. Sadakane
M.I. Abouelhoda
P. Ferragina
R. Dementiev
R. Sinha
S.J. Puglisi
S.J. Puglisi
T. Kasai
U. Manber
V. Mäkinen
Publication venue: The Royal Society
Publication date: 01/01/2009
Field of study

This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by The Royal Society and can be found at: http://rsta.royalsocietypublishing.org/.This paper illustrates the status of wave energy development in Pacific Rim countries by characterizing the available resource and introducing the region‟s current and potential future leaders in wave energy converter development. It also describes the existing licensing and permitting process as well as potential environmental concerns. Capabilities of Pacific Ocean testing facilities are described in addition to the region‟s vision of the future of wave energy

CiteSeerX

Crossref

ScholarsArchive@OSU

RMIT Research Repository

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Searching of gapped repeats and subrepetitions in a word

Author: D. Gusfield
G. Brodal
J. Storer
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
P. Emde Boas van
R. Kolpakov
R. Kolpakov
R. Kolpakov
T. Kociumaka
Z. Galil
Publication venue
Publication date: 29/09/2013
Field of study

A gapped repeat is a factor of the form

uvu

where

u

and

v

are nonempty words. The period of the gapped repeat is defined as

|u|+|v|

. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called

\alpha

-gapped if its period is not greater than

\alpha |v|

. A

\delta

-subrepetition is a factor which exponent is less than 2 but is not less than

1+\delta

(the exponent of the factor is the quotient of the length and the minimal period of the factor). The

\delta

-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We reveal a close relation between maximal gapped repeats and maximal subrepetitions. Moreover, we show that in a word of length

n

the number of maximal

\alpha

-gapped repeats is bounded by

O(\alpha^2n)

and the number of maximal

\delta

-subrepetitions is bounded by

O(n/\delta^2)

. Using the obtained upper bounds, we propose algorithms for finding all maximal

\alpha

-gapped repeats and all maximal

\delta

-subrepetitions in a word of length

n

. The algorithm for finding all maximal

\alpha

-gapped repeats has

O(\alpha^2n)

time complexity for the case of constant alphabet size and

O(n\log n + \alpha^2n)

time complexity for the general case. For finding all maximal

\delta

-subrepetitions we propose two algorithms. The first algorithm has

O(\frac{n\log\log n}{\delta^2})

time complexity for the case of constant alphabet size and

O(n\log n +\frac{n\log\log n}{\delta^2})

time complexity for the general case. The second algorithm has

O(n\log n+\frac{n}{\delta^2}\log \frac{1}{\delta})

expected time complexity

arXiv.org e-Print Archive

Crossref

Bethe Ansatz in the Bernoulli Matching Model of Random Sequence Alignment

Author: A. M. Vershik
D. Gusfield
D. Sankoff
J. M. Hammersley
Kirone Mallick
M. Ablowitz
M. S. Waterman
R. Dubrin
R. J. Baxter
R. Wagner
S. F. Altschul
S. M. Ulam
Satya N. Majumdar
Sergei Nechaev
V. Dancik
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2008
Field of study

For the Bernoulli Matching model of sequence alignment problem we apply the Bethe ansatz technique via an exact mapping to the 5--vertex model on a square lattice. Considering the terrace--like representation of the sequence alignment problem, we reproduce by the Bethe ansatz the results for the averaged length of the Longest Common Subsequence in Bernoulli approximation. In addition, we compute the average number of nucleation centers of the terraces.Comment: 14 pages, 5 figures (some points are clarified

arXiv.org e-Print Archive

Crossref

HAL-CEA

Duel and sweep algorithm for order-preserving pattern matching

Author: A Amir
D Gusfield
DE Knuth
J Kim
M Crochemore
M Kubica
MM Hasan
R Cole
RN Horspool
RS Boyer
S Cho
S Faro
T Chhabra
U Vishkin
U Vishkin
Publication venue
Publication date: 26/05/2017
Field of study

Given a text

T

and a pattern

P

over alphabet

\Sigma

, the classic exact matching problem searches for all occurrences of pattern

P

in text

T

. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in

O(n + m\log m)

time in general and

O(n + m)

time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in

O(n^2)

time for duel stage and

O(n^2 m)

time for sweeping time with

O(m^3)

preprocessing time.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Time-frequency scaling transformation of the phonocardiogram based of the matching pursuit method.

Author: A.G. Clark
A.J. Jeffreys
B. Padhukasahasram
C. Carlson
D. Gusfield
D. Gusfield
D. Gusfield
G. Drouin
J. Hein
J. Hein
J. Hein
J.C. Stephens
J.D. Wall
L. Frisse
M. Lajoie
N. El-Mabrouk
P. Fearnhead
R. Hudson
R. Hudson
S. Sawyer
S.R. Myers
T. Wiehe
The International HapMap Consortium
V. Bafna
V. Bafna
Y.S. Song
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/1998
Field of study

International audienceA time-frequency scaling transformation based on the matching pursuit (MP) method is developed for the phonocardiogram (PCG). The MP method decomposes a signal into a series of time-frequency atoms by using an iterative process. The modification of the time scale of the PCG can be performed without perceptible change in its spectral characteristics. It is also possible to modify the frequency scale without changing the temporal properties. The technique has been tested on 11 PCG's containing heart sounds and different murmurs. A scaling/inverse-scaling procedure was used for quantitative evaluation of the scaling performance. Both the spectrogram and a MP-based Wigner distribution were used for visual comparison in the time-frequency domain. The results showed that the technique is suitable and effective for the time-frequency scale transformation of both the transient property of the heart sounds and the more complex random property of the murmurs. It is also shown that the effectiveness of the method is strongly related to the optimization of the parameters used for the decomposition of the signals

Crossref

HAL-Inserm

HAL-Rennes 1

Suffix Tree of Alignment: An Efficient Index for Similar Data

Author: A. Amir
D. Gusfield
E. Ukkonen
E.M. McCreight
G. Navarro
H.H. Do
J. Ziv
K. Sadakane
M. Crochemore
M. Farach-Colton
P. Bille
R. Grossi
R.A. Baeza-Yates
S. Huang
S. Karlin
S. Kuruppu
V. Levenshtein
V. Mäkinen
V. Mäkinen
Publication venue
Publication date: 01/01/2013
Field of study

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings

A

and

B

is a compacted trie representing all suffixes in

A

and

B

. It has

|A|+|B|

leaves and can be constructed in

O(|A|+|B|)

time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of

A

and

B

. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of

A

and

B

has

|A| + l_d + l_1

leaves where

l_d

is the sum of the lengths of all parts of

B

different from

A

and

l_1

is the sum of the lengths of some common parts of

A

and

B

. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern

P

O(|P|+occ)

time where

occ

is the number of occurrences of

P

A

and

B

. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires

O(|A| + l_d + l_1 + l_2)

time where

l_2

is the sum of the lengths of other common substrings of

A

and

B

. When the suffix tree of

A

is already given, it requires

O(l_d + l_1 + l_2)

time.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

k-Abelian Pattern Matching

Author: D. Breslauer
D. Gusfield
J. Karhumäki
J. Karhumäki
J. Kärkkäinen
L.J. Cummings
M. Huova
M. Huova
M. Lothaire
M. Ružić
M.G. Maaß
R. Mercaş
T. Gagie
T. Kociumaka
Publication venue
Publication date: 01/01/2014
Field of study

Two words are called

k

-abelian equivalent, if they share the same multiplicities for all factors of length at most

k

. We present an optimal linear time algorithm for identifying all occurrences of factors in a text that are

k

-abelian equivalent to some pattern. Moreover, an optimal algorithm for finding the largest

k

for which two words are

k

-abelian equivalent is given. Solutions for various online versions of the

k

-abelian pattern matching problem are also proposed

Crossref

MACAU: Open Access Repository of Kiel University

Longest Common Extensions in Trees

Author: A Amir
D Breslauer
D Gusfield
D Gusfield
D Harel
GM Landau
GM Landau
H Bannai
H Cohen
J Fischer
M Ružić
MA Bender
MA Bender
MG Main
O Berkman
P Emde Boas van
PF Dietz
R Cole
RF Geary
S Alstrup
T Shibuya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The longest common extension (LCE) of two indices in a string is the length of the longest identical substrings starting at these two indices. The LCE problem asks to preprocess a string into a compact data structure that supports fast LCE queries. In this paper we generalize the LCE problem to trees and suggest a few applications of LCE in trees to tries and XML databases. Given a labeled and rooted tree

T

of size

n

, the goal is to preprocess

T

into a compact data structure that support the following LCE queries between subpaths and subtrees in

T

. Let

v_1

v_2

w_1

, and

w_2

be nodes of

T

such that

w_1

and

w_2

are descendants of

v_1

and

v_2

respectively. \begin{itemize} \item \LCEPP(v_1, w_1, v_2, w_2): (path-path \LCE) return the longest common prefix of the paths

v_1 \leadsto w_1

and

v_2 \leadsto w_2

. \item \LCEPT(v_1, w_1, v_2): (path-tree \LCE) return maximal path-path LCE of the path

v_1 \leadsto w_1

and any path from

v_2

to a descendant leaf. \item \LCETT(v_1, v_2): (tree-tree \LCE) return a maximal path-path LCE of any pair of paths from

v_1

and

v_2

to descendant leaves. \end{itemize} We present the first non-trivial bounds for supporting these queries. For \LCEPP queries, we present a linear-space solution with

O(\log^{*} n)

query time. For \LCEPT queries, we present a linear-space solution with

O((\log\log n)^{2})

query time, and complement this with a lower bound showing that any path-tree LCE structure of size O(n \polylog(n)) must necessarily use

\Omega(\log\log n)

time to answer queries. For \LCETT queries, we present a time-space trade-off, that given any parameter

\tau

1 \leq \tau \leq n

, leads to an

O(n\tau)

space and

O(n/\tau)

query-time solution. This is complemented with a reduction to the the set intersection problem implying that a fast linear space solution is not likely to exist

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology