764 research outputs found

    A Minimal Periods Algorithm with Applications

    Full text link
    Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent k2k\geq2 and integer s0s\geq0. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page

    The stochastic matching problem

    Get PDF
    The matching problem plays a basic role in combinatorial optimization and in statistical mechanics. In its stochastic variants, optimization decisions have to be taken given only some probabilistic information about the instance. While the deterministic case can be solved in polynomial time, stochastic variants are worst-case intractable. We propose an efficient method to solve stochastic matching problems which combines some features of the survey propagation equations and of the cavity method. We test it on random bipartite graphs, for which we analyze the phase diagram and compare the results with exact bounds. Our approach is shown numerically to be effective on the full range of parameters, and to outperform state-of-the-art methods. Finally we discuss how the method can be generalized to other problems of optimization under uncertainty.Comment: Published version has very minor change

    Wave Energy: a Pacific Perspective

    Get PDF
    This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by The Royal Society and can be found at: http://rsta.royalsocietypublishing.org/.This paper illustrates the status of wave energy development in Pacific Rim countries by characterizing the available resource and introducing the region‟s current and potential future leaders in wave energy converter development. It also describes the existing licensing and permitting process as well as potential environmental concerns. Capabilities of Pacific Ocean testing facilities are described in addition to the region‟s vision of the future of wave energy

    Searching of gapped repeats and subrepetitions in a word

    Full text link
    A gapped repeat is a factor of the form uvuuvu where uu and vv are nonempty words. The period of the gapped repeat is defined as u+v|u|+|v|. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called α\alpha-gapped if its period is not greater than αv\alpha |v|. A δ\delta-subrepetition is a factor which exponent is less than 2 but is not less than 1+δ1+\delta (the exponent of the factor is the quotient of the length and the minimal period of the factor). The δ\delta-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We reveal a close relation between maximal gapped repeats and maximal subrepetitions. Moreover, we show that in a word of length nn the number of maximal α\alpha-gapped repeats is bounded by O(α2n)O(\alpha^2n) and the number of maximal δ\delta-subrepetitions is bounded by O(n/δ2)O(n/\delta^2). Using the obtained upper bounds, we propose algorithms for finding all maximal α\alpha-gapped repeats and all maximal δ\delta-subrepetitions in a word of length nn. The algorithm for finding all maximal α\alpha-gapped repeats has O(α2n)O(\alpha^2n) time complexity for the case of constant alphabet size and O(nlogn+α2n)O(n\log n + \alpha^2n) time complexity for the general case. For finding all maximal δ\delta-subrepetitions we propose two algorithms. The first algorithm has O(nloglognδ2)O(\frac{n\log\log n}{\delta^2}) time complexity for the case of constant alphabet size and O(nlogn+nloglognδ2)O(n\log n +\frac{n\log\log n}{\delta^2}) time complexity for the general case. The second algorithm has O(nlogn+nδ2log1δ)O(n\log n+\frac{n}{\delta^2}\log \frac{1}{\delta}) expected time complexity

    Bethe Ansatz in the Bernoulli Matching Model of Random Sequence Alignment

    Full text link
    For the Bernoulli Matching model of sequence alignment problem we apply the Bethe ansatz technique via an exact mapping to the 5--vertex model on a square lattice. Considering the terrace--like representation of the sequence alignment problem, we reproduce by the Bethe ansatz the results for the averaged length of the Longest Common Subsequence in Bernoulli approximation. In addition, we compute the average number of nucleation centers of the terraces.Comment: 14 pages, 5 figures (some points are clarified

    Duel and sweep algorithm for order-preserving pattern matching

    Full text link
    Given a text TT and a pattern PP over alphabet Σ\Sigma, the classic exact matching problem searches for all occurrences of pattern PP in text TT. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in O(n+mlogm)O(n + m\log m) time in general and O(n+m)O(n + m) time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in O(n2)O(n^2) time for duel stage and O(n2m)O(n^2 m) time for sweeping time with O(m3)O(m^3) preprocessing time.Comment: 13 pages, 5 figure

    Time-frequency scaling transformation of the phonocardiogram based of the matching pursuit method.

    No full text
    International audienceA time-frequency scaling transformation based on the matching pursuit (MP) method is developed for the phonocardiogram (PCG). The MP method decomposes a signal into a series of time-frequency atoms by using an iterative process. The modification of the time scale of the PCG can be performed without perceptible change in its spectral characteristics. It is also possible to modify the frequency scale without changing the temporal properties. The technique has been tested on 11 PCG's containing heart sounds and different murmurs. A scaling/inverse-scaling procedure was used for quantitative evaluation of the scaling performance. Both the spectrogram and a MP-based Wigner distribution were used for visual comparison in the time-frequency domain. The results showed that the technique is suitable and effective for the time-frequency scale transformation of both the transient property of the heart sounds and the more complex random property of the murmurs. It is also shown that the effectiveness of the method is strongly related to the optimization of the parameters used for the decomposition of the signals

    Suffix Tree of Alignment: An Efficient Index for Similar Data

    Full text link
    We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings AA and BB is a compacted trie representing all suffixes in AA and BB. It has A+B|A|+|B| leaves and can be constructed in O(A+B)O(|A|+|B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of AA and BB. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of AA and BB has A+ld+l1|A| + l_d + l_1 leaves where ldl_d is the sum of the lengths of all parts of BB different from AA and l1l_1 is the sum of the lengths of some common parts of AA and BB. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern PP in O(P+occ)O(|P|+occ) time where occocc is the number of occurrences of PP in AA and BB. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires O(A+ld+l1+l2)O(|A| + l_d + l_1 + l_2) time where l2l_2 is the sum of the lengths of other common substrings of AA and BB. When the suffix tree of AA is already given, it requires O(ld+l1+l2)O(l_d + l_1 + l_2) time.Comment: 12 page

    k-Abelian Pattern Matching

    Get PDF
    Two words are called kk-abelian equivalent, if they share the same multiplicities for all factors of length at most kk. We present an optimal linear time algorithm for identifying all occurrences of factors in a text that are kk-abelian equivalent to some pattern. Moreover, an optimal algorithm for finding the largest kk for which two words are kk-abelian equivalent is given. Solutions for various online versions of the kk-abelian pattern matching problem are also proposed

    Longest Common Extensions in Trees

    Get PDF
    The longest common extension (LCE) of two indices in a string is the length of the longest identical substrings starting at these two indices. The LCE problem asks to preprocess a string into a compact data structure that supports fast LCE queries. In this paper we generalize the LCE problem to trees and suggest a few applications of LCE in trees to tries and XML databases. Given a labeled and rooted tree TT of size nn, the goal is to preprocess TT into a compact data structure that support the following LCE queries between subpaths and subtrees in TT. Let v1v_1, v2v_2, w1w_1, and w2w_2 be nodes of TT such that w1w_1 and w2w_2 are descendants of v1v_1 and v2v_2 respectively. \begin{itemize} \item \LCEPP(v_1, w_1, v_2, w_2): (path-path \LCE) return the longest common prefix of the paths v1w1v_1 \leadsto w_1 and v2w2v_2 \leadsto w_2. \item \LCEPT(v_1, w_1, v_2): (path-tree \LCE) return maximal path-path LCE of the path v1w1v_1 \leadsto w_1 and any path from v2v_2 to a descendant leaf. \item \LCETT(v_1, v_2): (tree-tree \LCE) return a maximal path-path LCE of any pair of paths from v1v_1 and v2v_2 to descendant leaves. \end{itemize} We present the first non-trivial bounds for supporting these queries. For \LCEPP queries, we present a linear-space solution with O(logn)O(\log^{*} n) query time. For \LCEPT queries, we present a linear-space solution with O((loglogn)2)O((\log\log n)^{2}) query time, and complement this with a lower bound showing that any path-tree LCE structure of size O(n \polylog(n)) must necessarily use Ω(loglogn)\Omega(\log\log n) time to answer queries. For \LCETT queries, we present a time-space trade-off, that given any parameter τ\tau, 1τn1 \leq \tau \leq n, leads to an O(nτ)O(n\tau) space and O(n/τ)O(n/\tau) query-time solution. This is complemented with a reduction to the the set intersection problem implying that a fast linear space solution is not likely to exist
    corecore