12,686 research outputs found
Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity
Consider two parties who want to compare their strings, e.g., genomes, but do
not want to reveal them to each other. We present a system for
privacy-preserving matching of strings, which differs from existing systems by
providing a deterministic approximation instead of an exact distance. It is
efficient (linear complexity), non-interactive and does not involve a third
party which makes it particularly suitable for cloud computing. We extend our
protocol, such that it mitigates iterated differential attacks proposed by
Goodrich. Further an implementation of the system is evaluated and compared
against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure
A Bloom filter based semi-index on -grams
We present a simple -gram based semi-index, which allows to look for a
pattern typically only in a small fraction of text blocks. Several space-time
tradeoffs are presented. Experiments on Pizza & Chili datasets show that our
solution is up to three orders of magnitude faster than the Claude et al.
\cite{CNPSTjda10} semi-index at a comparable space usage
Linear Algorithm for Conservative Degenerate Pattern Matching
A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a
sequence of such symbols is a degenerate string. A degenerate string is said to
be conservative if its number of non-solid symbols is upper-bounded by a fixed
positive constant k. We consider here the matching problem of conservative
degenerate strings and present the first linear-time algorithm that can find,
for given degenerate strings P* and T* of total length n containing k non-solid
symbols in total, the occurrences of P* in T* in O(nk) time
A practical index for approximate dictionary matching with few mismatches
Approximate dictionary matching is a classic string matching problem
(checking if a query string occurs in a collection of strings) with
applications in, e.g., spellchecking, online catalogs, geolocation, and web
searchers. We present a surprisingly simple solution called a split index,
which is based on the Dirichlet principle, for matching a keyword with few
mismatches, and experimentally show that it offers competitive space-time
tradeoffs. Our implementation in the C++ language is focused mostly on data
compaction, which is beneficial for the search speed (e.g., by being cache
friendly). We compare our solution with other algorithms and we show that it
performs better for the Hamming distance. Query times in the order of 1
microsecond were reported for one mismatch for the dictionary size of a few
megabytes on a medium-end PC. We also demonstrate that a basic compression
technique consisting in -gram substitution can significantly reduce the
index size (up to 50% of the input text size for the DNA), while still keeping
the query time relatively low
- …