14,497 research outputs found
A practical index for approximate dictionary matching with few mismatches
Approximate dictionary matching is a classic string matching problem
(checking if a query string occurs in a collection of strings) with
applications in, e.g., spellchecking, online catalogs, geolocation, and web
searchers. We present a surprisingly simple solution called a split index,
which is based on the Dirichlet principle, for matching a keyword with few
mismatches, and experimentally show that it offers competitive space-time
tradeoffs. Our implementation in the C++ language is focused mostly on data
compaction, which is beneficial for the search speed (e.g., by being cache
friendly). We compare our solution with other algorithms and we show that it
performs better for the Hamming distance. Query times in the order of 1
microsecond were reported for one mismatch for the dictionary size of a few
megabytes on a medium-end PC. We also demonstrate that a basic compression
technique consisting in -gram substitution can significantly reduce the
index size (up to 50% of the input text size for the DNA), while still keeping
the query time relatively low
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
Communication channel analysis and real time compressed sensing for high density neural recording devices
Next generation neural recording and Brain-
Machine Interface (BMI) devices call for high density or distributed
systems with more than 1000 recording sites. As the
recording site density grows, the device generates data on the
scale of several hundred megabits per second (Mbps). Transmitting
such large amounts of data induces significant power
consumption and heat dissipation for the implanted electronics.
Facing these constraints, efficient on-chip compression techniques
become essential to the reduction of implanted systems power
consumption. This paper analyzes the communication channel
constraints for high density neural recording devices. This paper
then quantifies the improvement on communication channel
using efficient on-chip compression methods. Finally, This paper
describes a Compressed Sensing (CS) based system that can
reduce the data rate by > 10x times while using power on
the order of a few hundred nW per recording channel
Compressed Sensing with Coherent and Redundant Dictionaries
This article presents novel results concerning the recovery of signals from
undersampled data in the common situation where such signals are not sparse in
an orthonormal basis or incoherent dictionary, but in a truly redundant
dictionary. This work thus bridges a gap in the literature and shows not only
that compressed sensing is viable in this context, but also that accurate
recovery is possible via an L1-analysis optimization problem. We introduce a
condition on the measurement/sensing matrix, which is a natural generalization
of the now well-known restricted isometry property, and which guarantees
accurate recovery of signals that are nearly sparse in (possibly) highly
overcomplete and coherent dictionaries. This condition imposes no incoherence
restriction on the dictionary and our results may be the first of this kind. We
discuss practical examples and the implications of our results on those
applications, and complement our study by demonstrating the potential of
L1-analysis for such problems
A* Orthogonal Matching Pursuit: Best-First Search for Compressed Sensing Signal Recovery
Compressed sensing is a developing field aiming at reconstruction of sparse
signals acquired in reduced dimensions, which make the recovery process
under-determined. The required solution is the one with minimum norm
due to sparsity, however it is not practical to solve the minimization
problem. Commonly used techniques include minimization, such as Basis
Pursuit (BP) and greedy pursuit algorithms such as Orthogonal Matching Pursuit
(OMP) and Subspace Pursuit (SP). This manuscript proposes a novel semi-greedy
recovery approach, namely A* Orthogonal Matching Pursuit (A*OMP). A*OMP
performs A* search to look for the sparsest solution on a tree whose paths grow
similar to the Orthogonal Matching Pursuit (OMP) algorithm. Paths on the tree
are evaluated according to a cost function, which should compensate for
different path lengths. For this purpose, three different auxiliary structures
are defined, including novel dynamic ones. A*OMP also incorporates pruning
techniques which enable practical applications of the algorithm. Moreover, the
adjustable search parameters provide means for a complexity-accuracy trade-off.
We demonstrate the reconstruction ability of the proposed scheme on both
synthetically generated data and images using Gaussian and Bernoulli
observation matrices, where A*OMP yields less reconstruction error and higher
exact recovery frequency than BP, OMP and SP. Results also indicate that novel
dynamic cost functions provide improved results as compared to a conventional
choice.Comment: accepted for publication in Digital Signal Processin
On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching
We present parallel algorithms for exact and approximate pattern matching
with suffix arrays, using a CREW-PRAM with processors. Given a static text
of length , we first show how to compute the suffix array interval of a
given pattern of length in
time for . For approximate pattern matching with differences or
mismatches, we show how to compute all occurrences of a given pattern in
time, where is the size of the alphabet
and . The workhorse of our algorithms is a data structure
for merging suffix array intervals quickly: Given the suffix array intervals
for two patterns and , we present a data structure for computing the
interval of in sequential time, or in
parallel time. All our data structures are of size bits (in addition to
the suffix array)
Distributed Representation of Geometrically Correlated Images with Compressed Linear Measurements
This paper addresses the problem of distributed coding of images whose
correlation is driven by the motion of objects or positioning of the vision
sensors. It concentrates on the problem where images are encoded with
compressed linear measurements. We propose a geometry-based correlation model
in order to describe the common information in pairs of images. We assume that
the constitutive components of natural images can be captured by visual
features that undergo local transformations (e.g., translation) in different
images. We first identify prominent visual features by computing a sparse
approximation of a reference image with a dictionary of geometric basis
functions. We then pose a regularized optimization problem to estimate the
corresponding features in correlated images given by quantized linear
measurements. The estimated features have to comply with the compressed
information and to represent consistent transformation between images. The
correlation model is given by the relative geometric transformations between
corresponding features. We then propose an efficient joint decoding algorithm
that estimates the compressed images such that they stay consistent with both
the quantized measurements and the correlation model. Experimental results show
that the proposed algorithm effectively estimates the correlation between
images in multi-view datasets. In addition, the proposed algorithm provides
effective decoding performance that compares advantageously to independent
coding solutions as well as state-of-the-art distributed coding schemes based
on disparity learning
Mismatch and resolution in compressive imaging
Highly coherent sensing matrices arise in discretization of continuum
problems such as radar and medical imaging when the grid spacing is below the
Rayleigh threshold as well as in using highly coherent, redundant dictionaries
as sparsifying operators. Algorithms (BOMP, BLOOMP) based on techniques of band
exclusion and local optimization are proposed to enhance Orthogonal Matching
Pursuit (OMP) and deal with such coherent sensing matrices. BOMP and BLOOMP
have provably performance guarantee of reconstructing sparse, widely separated
objects {\em independent} of the redundancy and have a sparsity constraint and
computational cost similar to OMP's. Numerical study demonstrates the
effectiveness of BLOOMP for compressed sensing with highly coherent, redundant
sensing matrices.Comment: Figure 5 revise
- …