54,871 research outputs found
Online Pattern Matching for String Edit Distance with Moves
Edit distance with moves (EDM) is a string-to-string distance measure that
includes substring moves in addition to ordinal editing operations to turn one
string to the other. Although optimizing EDM is intractable, it has many
applications especially in error detections. Edit sensitive parsing (ESP) is an
efficient parsing algorithm that guarantees an upper bound of parsing
discrepancies between different appearances of the same substrings in a string.
ESP can be used for computing an approximate EDM as the L1 distance between
characteristic vectors built by node labels in parsing trees. However, ESP is
not applicable to a streaming text data where a whole text is unknown in
advance. We present an online ESP (OESP) that enables an online pattern
matching for EDM. OESP builds a parse tree for a streaming text and computes
the L1 distance between characteristic vectors in an online manner. For the
space-efficient computation of EDM, OESP directly encodes the parse tree into a
succinct representation by leveraging the idea behind recent results of a
dynamic succinct tree. We experimentally test OESP on the ability to compute
EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International
Symposium on String Processing and Information Retrieval (SPIRE2014
Pattern Matching in Multiple Streams
We investigate the problem of deterministic pattern matching in multiple
streams. In this model, one symbol arrives at a time and is associated with one
of s streaming texts. The task at each time step is to report if there is a new
match between a fixed pattern of length m and a newly updated stream. As is
usual in the streaming context, the goal is to use as little space as possible
while still reporting matches quickly. We give almost matching upper and lower
space bounds for three distinct pattern matching problems. For exact matching
we show that the problem can be solved in constant time per arriving symbol and
O(m+s) words of space. For the k-mismatch and k-difference problems we give
O(k) time solutions that require O(m+ks) words of space. In all three cases we
also give space lower bounds which show our methods are optimal up to a single
logarithmic factor. Finally we set out a number of open problems related to
this new model for pattern matching.Comment: 13 pages, 1 figur
4D Seismic History Matching Incorporating Unsupervised Learning
The work discussed and presented in this paper focuses on the history
matching of reservoirs by integrating 4D seismic data into the inversion
process using machine learning techniques. A new integrated scheme for the
reconstruction of petrophysical properties with a modified Ensemble Smoother
with Multiple Data Assimilation (ES-MDA) in a synthetic reservoir is proposed.
The permeability field inside the reservoir is parametrised with an
unsupervised learning approach, namely K-means with Singular Value
Decomposition (K-SVD). This is combined with the Orthogonal Matching Pursuit
(OMP) technique which is very typical for sparsity promoting regularisation
schemes. Moreover, seismic attributes, in particular, acoustic impedance, are
parametrised with the Discrete Cosine Transform (DCT). This novel combination
of techniques from machine learning, sparsity regularisation, seismic imaging
and history matching aims to address the ill-posedness of the inversion of
historical production data efficiently using ES-MDA. In the numerical
experiments provided, I demonstrate that these sparse representations of the
petrophysical properties and the seismic attributes enables to obtain better
production data matches to the true production data and to quantify the
propagating waterfront better compared to more traditional methods that do not
use comparable parametrisation techniques
Upper and lower bounds for dynamic data structures on strings
We consider a range of simply stated dynamic data structure problems on
strings. An update changes one symbol in the input and a query asks us to
compute some function of the pattern of length and a substring of a longer
text. We give both conditional and unconditional lower bounds for variants of
exact matching with wildcards, inner product, and Hamming distance computation
via a sequence of reductions. As an example, we show that there does not exist
an time algorithm for a large range of these problems
unless the online Boolean matrix-vector multiplication conjecture is false. We
also provide nearly matching upper bounds for most of the problems we consider.Comment: Accepted at STACS'1
- …