112 research outputs found
Understanding maximal repetitions in strings
The cornerstone of any algorithm computing all repetitions in a string of
length n in O(n) time is the fact that the number of runs (or maximal
repetitions) is O(n). We give a simple proof of this result. As a consequence
of our approach, the stronger result concerning the linearity of the sum of
exponents of all runs follows easily
A Minimal Periods Algorithm with Applications
Kosaraju in ``Computation of squares in a string'' briefly described a
linear-time algorithm for computing the minimal squares starting at each
position in a word. Using the same construction of suffix trees, we generalize
his result and describe in detail how to compute in O(k|w|)-time the minimal
k-th power, with period of length larger than s, starting at each position in a
word w for arbitrary exponent and integer . We provide the
complete proof of correctness of the algorithm, which is somehow not completely
clear in Kosaraju's original paper. The algorithm can be used as a sub-routine
to detect certain types of pseudo-patterns in words, which is our original
intention to study the generalization.Comment: 14 page
On the maximal number of cubic subwords in a string
We investigate the problem of the maximum number of cubic subwords (of the
form ) in a given word. We also consider square subwords (of the form
). The problem of the maximum number of squares in a word is not well
understood. Several new results related to this problem are produced in the
paper. We consider two simple problems related to the maximum number of
subwords which are squares or which are highly repetitive; then we provide a
nontrivial estimation for the number of cubes. We show that the maximum number
of squares such that is not a primitive word (nonprimitive squares) in
a word of length is exactly , and the
maximum number of subwords of the form , for , is exactly .
In particular, the maximum number of cubes in a word is not greater than
either. Using very technical properties of occurrences of cubes, we improve
this bound significantly. We show that the maximum number of cubes in a word of
length is between and . (In particular, we improve the
lower bound from the conference version of the paper.)Comment: 14 page
Computing Runs on a General Alphabet
We describe a RAM algorithm computing all runs (maximal repetitions) of a
given string of length over a general ordered alphabet in
time and linear space. Our algorithm outperforms all
known solutions working in time provided , where is the alphabet size. We conjecture that there
exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure
Lempel-Ziv Factorization May Be Harder Than Computing All Runs
The complexity of computing the Lempel-Ziv factorization and the set of all
runs (= maximal repetitions) is studied in the decision tree model of
computation over ordered alphabet. It is known that both these problems can be
solved by RAM algorithms in time, where is the length of
the input string and is the number of distinct letters in it. We prove
an lower bound on the number of comparisons required to
construct the Lempel-Ziv factorization and thereby conclude that a popular
technique of computation of runs using the Lempel-Ziv factorization cannot
achieve an time bound. In contrast with this, we exhibit an
decision tree algorithm finding all runs in a string. Therefore, in the
decision tree model the runs problem is easier than the Lempel-Ziv
factorization. Thus we support the conjecture that there is a linear RAM
algorithm finding all runs.Comment: 12 pages, 3 figures, submitte
Computing periodicities in strings: A new approach
The most efficient methods currently available for the computation of repetitions or repeats in a string x = x[1..n] all depend on the prior computation of a suffix tree/array STx/SAx. Although these data structures can be computed in asymptotic Θ(n) time, nevertheless in practice they involve significant overhead, both in time and space. Since the number of repetitions/repeats in x can be reported in a way that is at most linear in string length, it therefore seems that it should be possible to devise less roundabout means of computing repetitions/repeats that take advantage of their infrequent occurrence. This survey paper provides background for these ideas and explores the possibilities for more efficient computation of periodicities in strings
Testing Generalised Freeness of Words
Pseudo-repetitions are a natural generalisation of the classical notion of repetitions in sequences: they are the repeated concatenation of a word and its encoding under a certain morphism or antimorphism (anti-/morphism, for short). We approach the problem of deciding efficiently, for a word w and a literal anti-/morphism f, whether w contains an instance of a given pattern involving a variable x and its image under f, i.e., f(x). Our results generalise both the problem of finding fixed repetitive structures (e.g., squares, cubes) inside a word and the problem of finding palindromic structures inside a word. For instance, we can detect efficiently a factor of the form xx^Rxxx^R, or any other pattern of such type. We also address the problem of testing efficiently, in the same setting, whether the word w contains an arbitrary pseudo-repetition of a given exponent
Automated analysis of oscillations in coronal bright points
Coronal bright points (BPs) are numerous, bright, small-scale dynamical
features found in the solar corona. Bright points have been observed to exhibit
intensity oscillations across a wide range of periodicities and are likely an
important signature of plasma heating and/or transport mechanisms. We present a
novel and efficient wavelet-based method that automatically detects and tracks
the intensity evolution of BPs using images from the Atmospheric Imaging
Assembly (AIA) on board the Solar Dynamics Observatory (SDO) in the 193\r{A}
bandpass. Through the study of a large, statistically significant set of BPs,
we attempt to place constraints on the underlying physical mechanisms. We used
a continuous wavelet transform (CWT) in 2D to detect the BPs within images.
One-dimensional CWTs were used to analyse the individual BP time series to
detect significant periodicities. We find significant periodicity at 4, 8-10,
17, 28, and 65 minutes. Bright point lifetimes are shown to follow a power law
with exponent . The relationship between the BP lifetime and
maximum diameter similarly follows a power law with exponent .
Our wavelet-based method successfully detects and extracts BPs and analyses
their intensity oscillations. Future work will expand upon these methods, using
larger datasets and simultaneous multi-instrument observations.Comment: Accepted for publication in A&A. 10 pages, 14 figures, 4 associated
movies. Movies will be available in A&
- …