Search CORE

6 research outputs found

String processing and information retrieval

Author: Hyyrö Heikki
Karlgren Jussi
Tarhio Jorma
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

Proceedings of the 16th International Symposium on String Processing and Information Retrieval (SPIRE 2009), Saariselkä, Finland, 25-27 August 2009

Swedish Institute of Computer Science Publications Database

A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance

Author: Kopelowitz Tsvi
Porat Ely
Publication venue: OASIcs - OpenAccess Series in Informatics. 1st Symposium on Simplicity in Algorithms (SOSA 2018)
Publication date: 01/01/2018
Field of study

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is tilde O(nsqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time

Dagstuhl Research Online Publication Server

Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching

Author: Ganguly Arnab
Hon Wing-Kai
Sadakane Kunihiko
Shah Rahul
Thankachan Sharma V.
Yang Yilin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Publication date: 01/01/2016
Field of study

Let S and S\u27 be two strings of the same length.We consider the following two variants of string matching. * Parameterized Matching: The characters of S and S\u27 are partitioned into static characters and parameterized characters. The strings are parameterized match iff the static characters match exactly and there exists a one-to-one function which renames the parameterized characters in S to those in S\u27. * Order-Preserving Matching: The strings are order-preserving match iff for any two integers i,j in [1,|S|], S[i] <= S[j] iff S\u27[i] <= S\u27[j]. Let P be a collection of d patterns {P_1, P_2, ..., P_d} of total length n characters, which are chosen from an alphabet Sigma. Given a text T, also over Sigma, we consider the dictionary indexing problem under the above definitions of string matching. Specifically, the task is to index P, such that we can report all positions j where at least one of the patterns P_i in P is a parameterized-match (resp. order-preserving match) with the same-length substring of

T

starting at j. Previous best-known indexes occupy O(n * log(n)) bits and can report all occ positions in O(|T| * log(|Sigma|) + occ) time. We present space-efficient indexes that occupy O(n * log(|Sigma|+d) * log(n)) bits and reports all occ positions in O(|T| * (log(|Sigma|) + log_{|Sigma|}(n)) + occ) time for parameterized matching and in O(|T| * log(n) + occ) time for order-preserving matching

Dagstuhl Research Online Publication Server

Louisiana State University

The Many Qualities of a New Directly Accessible Compression Scheme

Author: Cantone Domenico
Faro Simone
Publication venue
Publication date: 31/03/2023
Field of study

We present a new variable-length computation-friendly encoding scheme, named SFDC (Succinct Format with Direct aCcesibility), that supports direct and fast accessibility to any element of the compressed sequence and achieves compression ratios often higher than those offered by other solutions in the literature. The SFDC scheme provides a flexible and simple representation geared towards either practical efficiency or compression ratios, as required. For a text of length

n

over an alphabet of size

\sigma

and a fixed parameter

\lambda

, the access time of the proposed encoding is proportional to the length of the character's code-word, plus an expected

\mathcal{O}((F_{\sigma - \lambda + 3} - 3)/F_{\sigma+1})

overhead, where

F_j

is the

j

-th number of the Fibonacci sequence. In the overall it uses

N+\mathcal{O}\big(n \left(\lambda - (F_{\sigma+3}-3)/F_{\sigma+1}\big) \right) = N + \mathcal{O}(n)

bits, where

N

is the length of the encoded string. Experimental results show that the performance of our scheme is, in some respects, comparable with the performance of DACs and Wavelet Tees, which are among of the most efficient schemes. In addition our scheme is configured as a \emph{computation-friendly compression} scheme, as it counts several features that make it very effective in text processing tasks. In the string matching problem, that we take as a case study, we experimentally prove that the new scheme enables results that are up to 29 times faster than standard string-matching techniques on plain texts.Comment: 33 page

arXiv.org e-Print Archive

Pattern matching with variables: Efficient algorithms and complexity results

Author: Florin Manea (7168022)
Henning Fernau (7168292)
Markus Schmid (59491)
Robert Mercas (2835212)
Publication venue
Publication date: 11/02/2020
Field of study

A pattern α (i. e., a string of variables and terminals) matches a word w, if w can be obtained by uniformly replacing the variables of α by terminal words. The respective matching problem, i. e., deciding whether or not a given pattern matches a given word, is generally NP-complete, but can be solved in polynomial-time for restricted classes of patterns. We present efficient algorithms for the matching problem with respect to patterns with a bounded number of repeated variables and patterns with a structural restriction on the order of variables. Furthermore, we show that it is NP-complete to decide, for a given number k and a word w, whether w can be factorised into k distinct factors. As an immediate consequence of this hardness result, the injective version (i. e., different variables are replaced by different words) of the matching problem is NP-complete even for very restricted clases of patterns

Loughborough University Institutional Repository

Remote Sensing Data Compression

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

A huge amount of data is acquired nowadays by different remote sensing systems installed on satellites, aircrafts, and UAV. The acquired data then have to be transferred to image processing centres, stored and/or delivered to customers. In restricted scenarios, data compression is strongly desired or necessary. A wide diversity of coding methods can be used, depending on the requirements and their priority. In addition, the types and properties of images differ a lot, thus, practical implementation aspects have to be taken into account. The Special Issue paper collection taken as basis of this book touches on all of the aforementioned items to some degree, giving the reader an opportunity to learn about recent developments and research directions in the field of image compression. In particular, lossless and near-lossless compression of multi- and hyperspectral images still remains current, since such images constitute data arrays that are of extremely large size with rich information that can be retrieved from them for various applications. Another important aspect is the impact of lossless compression on image classification and segmentation, where a reasonable compromise between the characteristics of compression and the final tasks of data processing has to be achieved. The problems of data transition from UAV-based acquisition platforms, as well as the use of FPGA and neural networks, have become very important. Finally, attempts to apply compressive sensing approaches in remote sensing image processing with positive outcomes are observed. We hope that readers will find our book useful and interestin

Directory of Open Access Books (DOAB)