38,520 research outputs found
Dictionary matching in a stream
We consider the problem of dictionary matching in a stream. Given a set of
strings, known as a dictionary, and a stream of characters arriving one at a
time, the task is to report each time some string in our dictionary occurs in
the stream. We present a randomised algorithm which takes O(log log(k + m))
time per arriving character and uses O(k log m) words of space, where k is the
number of strings in the dictionary and m is the length of the longest string
in the dictionary
Rancang Bangun Aplikasi Kamus Fisika Dasar Menggunakan Algoritma String Matching Brute Force
Dictionary is a kind of reference book that is composed by abzad and lists of words and their meanings. Dictionaries are needed in the world of education to figure out the word that we want to know its meaning. Dictionary of physics is composed of various terms and explanations, which, if used as an application then the search he will take a long time, because the mobile is not able to display all terms, to ease the problem of finding the word, the dictionary is designed using the algorithm string matching. String matching algorithm is an algorithm used to solve the problem of matching the text to other texts. String algorithm used is brute force algorithm
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
A practical index for approximate dictionary matching with few mismatches
Approximate dictionary matching is a classic string matching problem
(checking if a query string occurs in a collection of strings) with
applications in, e.g., spellchecking, online catalogs, geolocation, and web
searchers. We present a surprisingly simple solution called a split index,
which is based on the Dirichlet principle, for matching a keyword with few
mismatches, and experimentally show that it offers competitive space-time
tradeoffs. Our implementation in the C++ language is focused mostly on data
compaction, which is beneficial for the search speed (e.g., by being cache
friendly). We compare our solution with other algorithms and we show that it
performs better for the Hamming distance. Query times in the order of 1
microsecond were reported for one mismatch for the dictionary size of a few
megabytes on a medium-end PC. We also demonstrate that a basic compression
technique consisting in -gram substitution can significantly reduce the
index size (up to 50% of the input text size for the DNA), while still keeping
the query time relatively low
RANCANG BANGUN APLIKASI KAMUS FISIKA DASAR MENGGUNAKAN ALGORITMA STRING MATCHING BRUTE FORCE
Dictionary is a kind of reference book that is composed by abzad and lists of words and their meanings. Dictionaries are needed in the world of education to figure out the word that we want to know its meaning. Dictionary of physics is composed of various terms and explanations, which, if used as an application then the search he will take a long time, because the mobile is not able to display all terms, to ease the problem of finding the word, the dictionary is designed using the algorithm string matching. String matching algorithm is an algorithm used to solve the problem of matching the text to other texts. String algorithm used is brute force algorithm
Succinct Dictionary Matching With No Slowdown
The problem of dictionary matching is a classical problem in string matching:
given a set S of d strings of total length n characters over an (not
necessarily constant) alphabet of size sigma, build a data structure so that we
can match in a any text T all occurrences of strings belonging to S. The
classical solution for this problem is the Aho-Corasick automaton which finds
all occ occurrences in a text T in time O(|T| + occ) using a data structure
that occupies O(m log m) bits of space where m <= n + 1 is the number of states
in the automaton. In this paper we show that the Aho-Corasick automaton can be
represented in just m(log sigma + O(1)) + O(d log(n/d)) bits of space while
still maintaining the ability to answer to queries in O(|T| + occ) time. To the
best of our knowledge, the currently fastest succinct data structure for the
dictionary matching problem uses space O(n log sigma) while answering queries
in O(|T|log log n + occ) time. In this paper we also show how the space
occupancy can be reduced to m(H0 + O(1)) + O(d log(n/d)) where H0 is the
empirical entropy of the characters appearing in the trie representation of the
set S, provided that sigma < m^epsilon for any constant 0 < epsilon < 1. The
query time remains unchanged.Comment: Corrected typos and other minor error
The complexity of the Multiple Pattern Matching Problem for random strings
We generalise a multiple string pattern matching algorithm, recently proposed
by Fredriksson and Grabowski [J. Discr. Alg. 7, 2009], to deal with arbitrary
dictionaries on an alphabet of size . If is the number of words of
length in the dictionary, and , the
complexity rate for the string characters to be read by this algorithm is at
most for some constant
. On the other side, we generalise the classical lower
bound of Yao [SIAM J. Comput. 8, 1979], for the problem with a single pattern,
to deal with arbitrary dictionaries, and determine it to be at least
. This proves the optimality of the
algorithm, improving and correcting previous claims.Comment: 25 pages, 4 figure
Pattern Masking for Dictionary Matching:Theory and Practice
Data masking is a common technique for sanitizing sensitive data maintained in database systems which is becoming increasingly important in various application areas, such as in record linkage of personal data. This work formalizes the Pattern Masking for Dictionary Matching (PMDM) problem: given a dictionary D of d strings, each of length ℓ, a query string q of length ℓ, and a positive integer z, we are asked to compute a smallest set K⊆{1, …, ℓ}, so that if q[i] is replaced by a wildcard for all i∈K, then q matches at least z strings from D. Solving PMDM allows providing data utility guarantees as opposed to existing approaches. We first show, through a reduction from the well-known k-Clique problem, that a decision version of the PMDM problem is NP-complete, even for binary strings. We thus approach the problem from a more practical perspective. We show a combinatorial O((dℓ)|K|/3+dℓ)-time and O(dℓ)-space algorithm for PMDM for |K|=O(1). In fact, we show that we cannot hope for a faster combinatorial algorithm, unless the combinatorial k-Clique hypothesis fails (Abboud et al. in SIAM J Comput 47:2527–2555, 2018; Lincoln et al., in: 29th ACM-SIAM Symposium on Discrete Algorithms (SODA), 2018). Our combinatorial algorithm, executed with small |K|, is the backbone of a greedy heuristic that we propose. Our experiments on real-world and synthetic datasets show that our heuristic finds nearly-optimal solutions in practice and is also very efficient. We also generalize this algorithm for the problem of masking multiple query strings simultaneously so that every string has at least z matches in D. PMDM can be viewed as a generalization of the decision version of the dictionary matching with mismatches problem: by querying a PMDM data structure with string q and z=1, one obtains the minimal number of mismatches of q with any string from D. The query time or space of all known data structures for the more restricted problem of dictionary matching with at most k mismatches incurs some exponential factor with respect to k. A simple exact algorithm for PMDM runs in time O(2ℓd). We present a data structure for PMDM that answers queries over D in time O(2ℓ/2(2ℓ/2+τ)ℓ) and requires space O(2ℓd2/τ2+2ℓ/2d), for any parameter τ∈[1, d]. We complement our results by showing a two-way polynomial-time reduction between PMDM and the Minimum Union problem [Chlamtáč et al., ACM-SIAM Symposium on Discrete Algorithms (SODA) 2017]. This gives a polynomial-time O(d1/4+ϵ)-approximation algorithm for PMDM, which is tight under a plausible complexity conjecture. This is an extended version of a paper that was presented at International Symposium on Algorithms and Computation (ISAAC) 2021
- …