764 research outputs found
A Minimal Periods Algorithm with Applications
Kosaraju in ``Computation of squares in a string'' briefly described a
linear-time algorithm for computing the minimal squares starting at each
position in a word. Using the same construction of suffix trees, we generalize
his result and describe in detail how to compute in O(k|w|)-time the minimal
k-th power, with period of length larger than s, starting at each position in a
word w for arbitrary exponent and integer . We provide the
complete proof of correctness of the algorithm, which is somehow not completely
clear in Kosaraju's original paper. The algorithm can be used as a sub-routine
to detect certain types of pseudo-patterns in words, which is our original
intention to study the generalization.Comment: 14 page
The stochastic matching problem
The matching problem plays a basic role in combinatorial optimization and in
statistical mechanics. In its stochastic variants, optimization decisions have
to be taken given only some probabilistic information about the instance. While
the deterministic case can be solved in polynomial time, stochastic variants
are worst-case intractable. We propose an efficient method to solve stochastic
matching problems which combines some features of the survey propagation
equations and of the cavity method. We test it on random bipartite graphs, for
which we analyze the phase diagram and compare the results with exact bounds.
Our approach is shown numerically to be effective on the full range of
parameters, and to outperform state-of-the-art methods. Finally we discuss how
the method can be generalized to other problems of optimization under
uncertainty.Comment: Published version has very minor change
Wave Energy: a Pacific Perspective
This is the author's peer-reviewed final manuscript, as accepted by the publisher. The published article is copyrighted by The Royal Society and can be found at: http://rsta.royalsocietypublishing.org/.This paper illustrates the status of wave energy development in Pacific Rim countries by characterizing the available resource and introducing the region‟s current and potential future leaders in wave energy converter development. It also describes the existing licensing and permitting process as well as potential environmental concerns. Capabilities of Pacific Ocean testing facilities are described in addition to the region‟s vision of the future of wave energy
Searching of gapped repeats and subrepetitions in a word
A gapped repeat is a factor of the form where and are nonempty
words. The period of the gapped repeat is defined as . The gapped
repeat is maximal if it cannot be extended to the left or to the right by at
least one letter with preserving its period. The gapped repeat is called
-gapped if its period is not greater than . A
-subrepetition is a factor which exponent is less than 2 but is not
less than (the exponent of the factor is the quotient of the length
and the minimal period of the factor). The -subrepetition is maximal if
it cannot be extended to the left or to the right by at least one letter with
preserving its minimal period. We reveal a close relation between maximal
gapped repeats and maximal subrepetitions. Moreover, we show that in a word of
length the number of maximal -gapped repeats is bounded by
and the number of maximal -subrepetitions is bounded by
. Using the obtained upper bounds, we propose algorithms for
finding all maximal -gapped repeats and all maximal
-subrepetitions in a word of length . The algorithm for finding all
maximal -gapped repeats has time complexity for the case
of constant alphabet size and time complexity for the
general case. For finding all maximal -subrepetitions we propose two
algorithms. The first algorithm has time
complexity for the case of constant alphabet size and time complexity for the general case. The
second algorithm has
expected time complexity
Bethe Ansatz in the Bernoulli Matching Model of Random Sequence Alignment
For the Bernoulli Matching model of sequence alignment problem we apply the
Bethe ansatz technique via an exact mapping to the 5--vertex model on a square
lattice. Considering the terrace--like representation of the sequence alignment
problem, we reproduce by the Bethe ansatz the results for the averaged length
of the Longest Common Subsequence in Bernoulli approximation. In addition, we
compute the average number of nucleation centers of the terraces.Comment: 14 pages, 5 figures (some points are clarified
Duel and sweep algorithm for order-preserving pattern matching
Given a text and a pattern over alphabet , the classic exact
matching problem searches for all occurrences of pattern in text .
Unlike exact matching problem, order-preserving pattern matching (OPPM)
considers the relative order of elements, rather than their real values. In
this paper, we propose an efficient algorithm for OPPM problem using the
"duel-and-sweep" paradigm. Our algorithm runs in time in
general and time under an assumption that the characters in a string
can be sorted in linear time with respect to the string size. We also perform
experiments and show that our algorithm is faster that KMP-based algorithm.
Last, we introduce the two-dimensional order preserved pattern matching and
give a duel and sweep algorithm that runs in time for duel stage and
time for sweeping time with preprocessing time.Comment: 13 pages, 5 figure
Time-frequency scaling transformation of the phonocardiogram based of the matching pursuit method.
International audienceA time-frequency scaling transformation based on the matching pursuit (MP) method is developed for the phonocardiogram (PCG). The MP method decomposes a signal into a series of time-frequency atoms by using an iterative process. The modification of the time scale of the PCG can be performed without perceptible change in its spectral characteristics. It is also possible to modify the frequency scale without changing the temporal properties. The technique has been tested on 11 PCG's containing heart sounds and different murmurs. A scaling/inverse-scaling procedure was used for quantitative evaluation of the scaling performance. Both the spectrogram and a MP-based Wigner distribution were used for visual comparison in the time-frequency domain. The results showed that the technique is suitable and effective for the time-frequency scale transformation of both the transient property of the heart sounds and the more complex random property of the murmurs. It is also shown that the effectiveness of the method is strongly related to the optimization of the parameters used for the decomposition of the signals
Suffix Tree of Alignment: An Efficient Index for Similar Data
We consider an index data structure for similar strings. The generalized
suffix tree can be a solution for this. The generalized suffix tree of two
strings and is a compacted trie representing all suffixes in and
. It has leaves and can be constructed in time.
However, if the two strings are similar, the generalized suffix tree is not
efficient because it does not exploit the similarity which is usually
represented as an alignment of and .
In this paper we propose a space/time-efficient suffix tree of alignment
which wisely exploits the similarity in an alignment. Our suffix tree for an
alignment of and has leaves where is the sum of
the lengths of all parts of different from and is the sum of the
lengths of some common parts of and . We did not compromise the pattern
search to reduce the space. Our suffix tree can be searched for a pattern
in time where is the number of occurrences of in and
. We also present an efficient algorithm to construct the suffix tree of
alignment. When the suffix tree is constructed from scratch, the algorithm
requires time where is the sum of the lengths
of other common substrings of and . When the suffix tree of is
already given, it requires time.Comment: 12 page
k-Abelian Pattern Matching
Two words are called -abelian equivalent, if they share the same multiplicities for all factors of length at most . We present an optimal linear time algorithm for identifying all occurrences of factors in a text that are -abelian equivalent to some pattern. Moreover, an optimal algorithm for finding the largest for which two words are -abelian equivalent is given. Solutions for various online versions of the -abelian pattern matching problem are also proposed
Longest Common Extensions in Trees
The longest common extension (LCE) of two indices in a string is the length
of the longest identical substrings starting at these two indices. The LCE
problem asks to preprocess a string into a compact data structure that supports
fast LCE queries. In this paper we generalize the LCE problem to trees and
suggest a few applications of LCE in trees to tries and XML databases. Given a
labeled and rooted tree of size , the goal is to preprocess into a
compact data structure that support the following LCE queries between subpaths
and subtrees in . Let , , , and be nodes of such
that and are descendants of and respectively.
\begin{itemize} \item \LCEPP(v_1, w_1, v_2, w_2): (path-path \LCE) return
the longest common prefix of the paths and . \item \LCEPT(v_1, w_1, v_2): (path-tree \LCE) return maximal
path-path LCE of the path and any path from to a
descendant leaf. \item \LCETT(v_1, v_2): (tree-tree \LCE) return a maximal
path-path LCE of any pair of paths from and to descendant leaves.
\end{itemize} We present the first non-trivial bounds for supporting these
queries. For \LCEPP queries, we present a linear-space solution with
query time. For \LCEPT queries, we present a linear-space
solution with query time, and complement this with a
lower bound showing that any path-tree LCE structure of size O(n \polylog(n))
must necessarily use time to answer queries. For \LCETT
queries, we present a time-space trade-off, that given any parameter , , leads to an space and query-time
solution. This is complemented with a reduction to the the set intersection
problem implying that a fast linear space solution is not likely to exist
- …