Search CORE

24,914 research outputs found

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Author: Fischer Johannes
Kurpicz Florian
Köppl Dominik
Publication venue
Publication date: 01/01/2016
Field of study

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with

p

processors. Given a static text of length

n

, we first show how to compute the suffix array interval of a given pattern of length

m

O(\frac{m}{p}+ \lg p + \lg\lg p\cdot\lg\lg n)

time for

p \le m

. For approximate pattern matching with

k

differences or mismatches, we show how to compute all occurrences of a given pattern in

O(\frac{m^k\sigma^k}{p}\max\left(k,\lg\lg n\right)\!+\!(1+\frac{m}{p}) \lg p\cdot \lg\lg n + \text{occ})

time, where

\sigma

is the size of the alphabet and

p \le \sigma^k m^k

. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns

P

and

P'

, we present a data structure for computing the interval of

PP'

O(\lg\lg n)

sequential time, or in

O(1+\lg_p\lg n)

parallel time. All our data structures are of size

O(n)

bits (in addition to the suffix array)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A practical index for approximate dictionary matching with few mismatches

Author: Cisłak Aleksander
Grabowski Szymon
Publication venue
Publication date: 11/02/2016
Field of study

Approximate dictionary matching is a classic string matching problem (checking if a query string occurs in a collection of strings) with applications in, e.g., spellchecking, online catalogs, geolocation, and web searchers. We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive space-time tradeoffs. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (e.g., by being cache friendly). We compare our solution with other algorithms and we show that it performs better for the Hamming distance. Query times in the order of 1 microsecond were reported for one mismatch for the dictionary size of a few megabytes on a medium-end PC. We also demonstrate that a basic compression technique consisting in

q

-gram substitution can significantly reduce the index size (up to 50% of the input text size for the DNA), while still keeping the query time relatively low

arXiv.org e-Print Archive

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Dependent Nonparametric Bayesian Group Dictionary Learning for online reconstruction of Dynamic MR images

Author: Kassim Ashraf A.
Roohi Shahrooz Faghih
Zonoobi Dornoosh
Publication venue
Publication date: 11/02/2015
Field of study

In this paper, we introduce a dictionary learning based approach applied to the problem of real-time reconstruction of MR image sequences that are highly undersampled in k-space. Unlike traditional dictionary learning, our method integrates both global and patch-wise (local) sparsity information and incorporates some priori information into the reconstruction process. Moreover, we use a Dependent Hierarchical Beta-process as the prior for the group-based dictionary learning, which adaptively infers the dictionary size and the sparsity of each patch; and also ensures that similar patches are manifested in terms of similar dictionary atoms. An efficient numerical algorithm based on the alternating direction method of multipliers (ADMM) is also presented. Through extensive experimental results we show that our proposed method achieves superior reconstruction quality, compared to the other state-of-the- art DL-based methods

arXiv.org e-Print Archive

CiteSeerX

GPU LSM: A Dynamic Dictionary Data Structure for the GPU

Author: Amenta Nina
Ashkiani Saman
Farach-Colton Martin
Li Shengren
Owens John D.
Publication venue
Publication date: 01/01/2018
Field of study

We develop a dynamic dictionary data structure for the GPU, supporting fast insertions and deletions, based on the Log Structured Merge tree (LSM). Our implementation on an NVIDIA K40c GPU has an average update (insertion or deletion) rate of 225 M elements/s, 13.5x faster than merging items into a sorted array. The GPU LSM supports the retrieval operations of lookup, count, and range query operations with an average rate of 75 M, 32 M and 23 M queries/s respectively. The trade-off for the dynamic updates is that the sorted array is almost twice as fast on retrievals. We believe that our GPU LSM is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS'18

arXiv.org e-Print Archive

eScholarship - University of California