6 research outputs found
String processing and information retrieval
Proceedings of the 16th International Symposium on String Processing and Information Retrieval (SPIRE 2009), Saariselkä, Finland, 25-27 August 2009
A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance
The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is tilde O(nsqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time
Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching
Let S and S\u27 be two strings of the same length.We consider the following two variants of string matching.
* Parameterized Matching: The characters of S and S\u27 are partitioned into static characters and parameterized characters.
The strings are parameterized match iff the static characters match exactly and there exists a one-to-one function which renames the parameterized characters in S to those in S\u27.
* Order-Preserving Matching: The strings are order-preserving match iff for any two integers i,j in [1,|S|], S[i] <= S[j] iff S\u27[i] <= S\u27[j].
Let P be a collection of d patterns {P_1, P_2, ..., P_d} of total length n characters, which are chosen from an alphabet Sigma.
Given a text T, also over Sigma, we consider the dictionary indexing problem under the above definitions of string matching.
Specifically, the task is to index P, such that we can report all positions j where at least one of the patterns P_i in P is a parameterized-match (resp. order-preserving match) with the same-length substring of starting at j. Previous best-known indexes occupy O(n * log(n)) bits and can report all occ positions in O(|T| * log(|Sigma|) + occ) time. We present space-efficient indexes that occupy O(n * log(|Sigma|+d) * log(n)) bits and reports all occ positions in O(|T| * (log(|Sigma|) + log_{|Sigma|}(n)) + occ) time for parameterized matching and in O(|T| * log(n) + occ) time for order-preserving matching
The Many Qualities of a New Directly Accessible Compression Scheme
We present a new variable-length computation-friendly encoding scheme, named
SFDC (Succinct Format with Direct aCcesibility), that supports direct and fast
accessibility to any element of the compressed sequence and achieves
compression ratios often higher than those offered by other solutions in the
literature. The SFDC scheme provides a flexible and simple representation
geared towards either practical efficiency or compression ratios, as required.
For a text of length over an alphabet of size and a fixed
parameter , the access time of the proposed encoding is proportional
to the length of the character's code-word, plus an expected
overhead, where
is the -th number of the Fibonacci sequence. In the overall it uses
bits, where is the length of the encoded string.
Experimental results show that the performance of our scheme is, in some
respects, comparable with the performance of DACs and Wavelet Tees, which are
among of the most efficient schemes. In addition our scheme is configured as a
\emph{computation-friendly compression} scheme, as it counts several features
that make it very effective in text processing tasks. In the string matching
problem, that we take as a case study, we experimentally prove that the new
scheme enables results that are up to 29 times faster than standard
string-matching techniques on plain texts.Comment: 33 page
Pattern matching with variables: Efficient algorithms and complexity results
A pattern α (i. e., a string of variables and terminals) matches a word w, if w can be obtained by uniformly replacing the variables of α by terminal words. The respective matching problem, i. e., deciding whether or not a given pattern matches a given word, is generally NP-complete, but can be solved in polynomial-time for restricted classes of patterns. We present efficient algorithms for the matching problem with respect to patterns with a bounded number of repeated variables and patterns with a structural restriction on the order of variables. Furthermore, we show that it is NP-complete to decide, for a given number k and a word w, whether w can be factorised into k distinct factors. As an immediate consequence of this hardness result, the injective version (i. e., different variables are replaced by different words) of the matching problem is NP-complete even for very restricted clases of patterns
Remote Sensing Data Compression
A huge amount of data is acquired nowadays by different remote sensing systems installed on satellites, aircrafts, and UAV. The acquired data then have to be transferred to image processing centres, stored and/or delivered to customers. In restricted scenarios, data compression is strongly desired or necessary. A wide diversity of coding methods can be used, depending on the requirements and their priority. In addition, the types and properties of images differ a lot, thus, practical implementation aspects have to be taken into account. The Special Issue paper collection taken as basis of this book touches on all of the aforementioned items to some degree, giving the reader an opportunity to learn about recent developments and research directions in the field of image compression. In particular, lossless and near-lossless compression of multi- and hyperspectral images still remains current, since such images constitute data arrays that are of extremely large size with rich information that can be retrieved from them for various applications. Another important aspect is the impact of lossless compression on image classification and segmentation, where a reasonable compromise between the characteristics of compression and the final tasks of data processing has to be achieved. The problems of data transition from UAV-based acquisition platforms, as well as the use of FPGA and neural networks, have become very important. Finally, attempts to apply compressive sensing approaches in remote sensing image processing with positive outcomes are observed. We hope that readers will find our book useful and interestin