    A Minimal Periods Algorithm with Applications

    Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent k2k\geq2 and integer s0s\geq0. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page

    The stochastic matching problem

    The matching problem plays a basic role in combinatorial optimization and in statistical mechanics. In its stochastic variants, optimization decisions have to be taken given only some probabilistic information about the instance. While the deterministic case can be solved in polynomial time, stochastic variants are worst-case intractable. We propose an efficient method to solve stochastic matching problems which combines some features of the survey propagation equations and of the cavity method. We test it on random bipartite graphs, for which we analyze the phase diagram and compare the results with exact bounds. Our approach is shown numerically to be effective on the full range of parameters, and to outperform state-of-the-art methods. Finally we discuss how the method can be generalized to other problems of optimization under uncertainty.Comment: Published version has very minor change

    Wave Energy: a Pacific Perspective

    This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by The Royal Society and can be found at: http://rsta.royalsocietypublishing.org/.This paper illustrates the status of wave energy development in Pacific Rim countries by characterizing the available resource and introducing the region‟s current and potential future leaders in wave energy converter development. It also describes the existing licensing and permitting process as well as potential environmental concerns. Capabilities of Pacific Ocean testing facilities are described in addition to the region‟s vision of the future of wave energy

    Searching of gapped repeats and subrepetitions in a word

    A gapped repeat is a factor of the form uvuuvu where uu and vv are nonempty words. The period of the gapped repeat is defined as u+v|u|+|v|. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called α\alpha-gapped if its period is not greater than αv\alpha |v|. A δ\delta-subrepetition is a factor which exponent is less than 2 but is not less than 1+δ1+\delta (the exponent of the factor is the quotient of the length and the minimal period of the factor). The δ\delta-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We reveal a close relation between maximal gapped repeats and maximal subrepetitions. Moreover, we show that in a word of length nn the number of maximal α\alpha-gapped repeats is bounded by O(α2n)O(\alpha^2n) and the number of maximal δ\delta-subrepetitions is bounded by O(n/δ2)O(n/\delta^2). Using the obtained upper bounds, we propose algorithms for finding all maximal α\alpha-gapped repeats and all maximal δ\delta-subrepetitions in a word of length nn. The algorithm for finding all maximal α\alpha-gapped repeats has O(α2n)O(\alpha^2n) time complexity for the case of constant alphabet size and O(nlogn+α2n)O(n\log n + \alpha^2n) time complexity for the general case. For finding all maximal δ\delta-subrepetitions we propose two algorithms. The first algorithm has O(nloglognδ2)O(\frac{n\log\log n}{\delta^2}) time complexity for the case of constant alphabet size and O(nlogn+nloglognδ2)O(n\log n +\frac{n\log\log n}{\delta^2}) time complexity for the general case. The second algorithm has O(nlogn+nδ2log1δ)O(n\log n+\frac{n}{\delta^2}\log \frac{1}{\delta}) expected time complexity

    Bethe Ansatz in the Bernoulli Matching Model of Random Sequence Alignment

    For the Bernoulli Matching model of sequence alignment problem we apply the Bethe ansatz technique via an exact mapping to the 5--vertex model on a square lattice. Considering the terrace--like representation of the sequence alignment problem, we reproduce by the Bethe ansatz the results for the averaged length of the Longest Common Subsequence in Bernoulli approximation. In addition, we compute the average number of nucleation centers of the terraces.Comment: 14 pages, 5 figures (some points are clarified

    Duel and sweep algorithm for order-preserving pattern matching

    Given a text TT and a pattern PP over alphabet Σ\Sigma, the classic exact matching problem searches for all occurrences of pattern PP in text TT. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in O(n+mlogm)O(n + m\log m) time in general and O(n+m)O(n + m) time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in O(n2)O(n^2) time for duel stage and O(n2m)O(n^2 m) time for sweeping time with O(m3)O(m^3) preprocessing time.Comment: 13 pages, 5 figure

    Time-frequency scaling transformation of the phonocardiogram based of the matching pursuit method.

    International audienceA time-frequency scaling transformation based on the matching pursuit (MP) method is developed for the phonocardiogram (PCG). The MP method decomposes a signal into a series of time-frequency atoms by using an iterative process. The modification of the time scale of the PCG can be performed without perceptible change in its spectral characteristics. It is also possible to modify the frequency scale without changing the temporal properties. The technique has been tested on 11 PCG's containing heart sounds and different murmurs. A scaling/inverse-scaling procedure was used for quantitative evaluation of the scaling performance. Both the spectrogram and a MP-based Wigner distribution were used for visual comparison in the time-frequency domain. The results showed that the technique is suitable and effective for the time-frequency scale transformation of both the transient property of the heart sounds and the more complex random property of the murmurs. It is also shown that the effectiveness of the method is strongly related to the optimization of the parameters used for the decomposition of the signals

    Suffix Tree of Alignment: An Efficient Index for Similar Data

    We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings AA and BB is a compacted trie representing all suffixes in AA and BB. It has A+B|A|+|B| leaves and can be constructed in O(A+B)O(|A|+|B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of AA and BB. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of AA and BB has A+ld+l1|A| + l_d + l_1 leaves where ldl_d is the sum of the lengths of all parts of BB different from AA and l1l_1 is the sum of the lengths of some common parts of AA and BB. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern PP in O(P+occ)O(|P|+occ) time where occocc is the number of occurrences of PP in AA and BB. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires O(A+ld+l1+l2)O(|A| + l_d + l_1 + l_2) time where l2l_2 is the sum of the lengths of other common substrings of AA and BB. When the suffix tree of AA is already given, it requires O(ld+l1+l2)O(l_d + l_1 + l_2) time.Comment: 12 page

    Optimality Clue for Graph Coloring Problem

    In this paper, we present a new approach which qualifies or not a solution found by a heuristic as a potential optimal solution. Our approach is based on the following observation: for a minimization problem, the number of admissible solutions decreases with the value of the objective function. For the Graph Coloring Problem (GCP), we confirm this observation and present a new way to prove optimality. This proof is based on the counting of the number of different k-colorings and the number of independent sets of a given graph G. Exact solutions counting problems are difficult problems (\#P-complete). However, we show that, using only randomized heuristics, it is possible to define an estimation of the upper bound of the number of k-colorings. This estimate has been calibrated on a large benchmark of graph instances for which the exact number of optimal k-colorings is known. Our approach, called optimality clue, build a sample of k-colorings of a given graph by running many times one randomized heuristic on the same graph instance. We use the evolutionary algorithm HEAD [Moalic et Gondran, 2018], which is one of the most efficient heuristic for GCP. Optimality clue matches with the standard definition of optimality on a wide number of instances of DIMACS and RBCII benchmarks where the optimality is known. Then, we show the clue of optimality for another set of graph instances. Optimality Metaheuristics Near-optimal

    k-Abelian Pattern Matching

    Two words are called kk-abelian equivalent, if they share the same multiplicities for all factors of length at most kk. We present an optimal linear time algorithm for identifying all occurrences of factors in a text that are kk-abelian equivalent to some pattern. Moreover, an optimal algorithm for finding the largest kk for which two words are kk-abelian equivalent is given. Solutions for various online versions of the kk-abelian pattern matching problem are also proposed