112 research outputs found

    Understanding maximal repetitions in strings

    Get PDF
    The cornerstone of any algorithm computing all repetitions in a string of length n in O(n) time is the fact that the number of runs (or maximal repetitions) is O(n). We give a simple proof of this result. As a consequence of our approach, the stronger result concerning the linearity of the sum of exponents of all runs follows easily

    A Minimal Periods Algorithm with Applications

    Full text link
    Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent k2k\geq2 and integer s0s\geq0. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page

    On the maximal number of cubic subwords in a string

    Full text link
    We investigate the problem of the maximum number of cubic subwords (of the form wwwwww) in a given word. We also consider square subwords (of the form wwww). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares xxxx such that xx is not a primitive word (nonprimitive squares) in a word of length nn is exactly n21\lfloor \frac{n}{2}\rfloor - 1, and the maximum number of subwords of the form xkx^k, for k3k\ge 3, is exactly n2n-2. In particular, the maximum number of cubes in a word is not greater than n2n-2 either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length nn is between (1/2)n(1/2)n and (4/5)n(4/5)n. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page

    Computing Runs on a General Alphabet

    Full text link
    We describe a RAM algorithm computing all runs (maximal repetitions) of a given string of length nn over a general ordered alphabet in O(nlog23n)O(n\log^{\frac{2}3} n) time and linear space. Our algorithm outperforms all known solutions working in Θ(nlogσ)\Theta(n\log\sigma) time provided σ=nΩ(1)\sigma = n^{\Omega(1)}, where σ\sigma is the alphabet size. We conjecture that there exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure

    Lempel-Ziv Factorization May Be Harder Than Computing All Runs

    Get PDF
    The complexity of computing the Lempel-Ziv factorization and the set of all runs (= maximal repetitions) is studied in the decision tree model of computation over ordered alphabet. It is known that both these problems can be solved by RAM algorithms in O(nlogσ)O(n\log\sigma) time, where nn is the length of the input string and σ\sigma is the number of distinct letters in it. We prove an Ω(nlogσ)\Omega(n\log\sigma) lower bound on the number of comparisons required to construct the Lempel-Ziv factorization and thereby conclude that a popular technique of computation of runs using the Lempel-Ziv factorization cannot achieve an o(nlogσ)o(n\log\sigma) time bound. In contrast with this, we exhibit an O(n)O(n) decision tree algorithm finding all runs in a string. Therefore, in the decision tree model the runs problem is easier than the Lempel-Ziv factorization. Thus we support the conjecture that there is a linear RAM algorithm finding all runs.Comment: 12 pages, 3 figures, submitte

    Computing periodicities in strings: A new approach

    Get PDF
    The most efficient methods currently available for the computation of repetitions or repeats in a string x = x[1..n] all depend on the prior computation of a suffix tree/array STx/SAx. Although these data structures can be computed in asymptotic Θ(n) time, nevertheless in practice they involve significant overhead, both in time and space. Since the number of repetitions/repeats in x can be reported in a way that is at most linear in string length, it therefore seems that it should be possible to devise less roundabout means of computing repetitions/repeats that take advantage of their infrequent occurrence. This survey paper provides background for these ideas and explores the possibilities for more efficient computation of periodicities in strings

    Testing Generalised Freeness of Words

    Get PDF
    Pseudo-repetitions are a natural generalisation of the classical notion of repetitions in sequences: they are the repeated concatenation of a word and its encoding under a certain morphism or antimorphism (anti-/morphism, for short). We approach the problem of deciding efficiently, for a word w and a literal anti-/morphism f, whether w contains an instance of a given pattern involving a variable x and its image under f, i.e., f(x). Our results generalise both the problem of finding fixed repetitive structures (e.g., squares, cubes) inside a word and the problem of finding palindromic structures inside a word. For instance, we can detect efficiently a factor of the form xx^Rxxx^R, or any other pattern of such type. We also address the problem of testing efficiently, in the same setting, whether the word w contains an arbitrary pseudo-repetition of a given exponent

    Automated analysis of oscillations in coronal bright points

    Get PDF
    Coronal bright points (BPs) are numerous, bright, small-scale dynamical features found in the solar corona. Bright points have been observed to exhibit intensity oscillations across a wide range of periodicities and are likely an important signature of plasma heating and/or transport mechanisms. We present a novel and efficient wavelet-based method that automatically detects and tracks the intensity evolution of BPs using images from the Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory (SDO) in the 193\r{A} bandpass. Through the study of a large, statistically significant set of BPs, we attempt to place constraints on the underlying physical mechanisms. We used a continuous wavelet transform (CWT) in 2D to detect the BPs within images. One-dimensional CWTs were used to analyse the individual BP time series to detect significant periodicities. We find significant periodicity at 4, 8-10, 17, 28, and 65 minutes. Bright point lifetimes are shown to follow a power law with exponent 1.13±0.07-1.13\pm0.07. The relationship between the BP lifetime and maximum diameter similarly follows a power law with exponent 0.129±0.0110.129\pm0.011. Our wavelet-based method successfully detects and extracts BPs and analyses their intensity oscillations. Future work will expand upon these methods, using larger datasets and simultaneous multi-instrument observations.Comment: Accepted for publication in A&A. 10 pages, 14 figures, 4 associated movies. Movies will be available in A&

    Author index Volume 25 (1989)

    Get PDF
    corecore