1,317 research outputs found
On the average-case complexity of pattern matching with wildcards
Pattern matching with wildcards is a string matching problem with the goal of finding all factors of a text of length that match a pattern of length , where wildcards (characters that match everything) may be present.
In this paper we present a number of complexity results and fast average-case algorithms for pattern matching where wildcards are allowed in the pattern, however, the results are easily adapted to the case where wildcards are allowed in the text as well.
We analyse the \textit{average-case} complexity of these algorithms and derive non-trivial time bounds.
These are the first results on the average-case complexity of pattern matching with wildcards which provide a provable separation in time complexity between exact pattern matching and pattern matching with wildcards.
We introduce the \textit{wc-period} of a string which is the period of the binary mask where \textit{iff} and otherwise. We denote the length of the wc-period of a string by \textsc{wcp}(x).
We show the following results for constant and a pattern of length and wildcards with \textsc{wcp}(x)=p the prefix of length contains wildcards:
\begin{itemize}
\item If there is an optimal algorithm running in \cO(\frac{n \log_\sigma m}{m})-time on average.
\item If there is an algorithm running in \cO(\frac{n \log_\sigma m\log_2 p}{m})-time on average.
\item If any algorithm takes at least -time on average.
\end{itemize
Upper and lower bounds for dynamic data structures on strings
We consider a range of simply stated dynamic data structure problems on
strings. An update changes one symbol in the input and a query asks us to
compute some function of the pattern of length and a substring of a longer
text. We give both conditional and unconditional lower bounds for variants of
exact matching with wildcards, inner product, and Hamming distance computation
via a sequence of reductions. As an example, we show that there does not exist
an time algorithm for a large range of these problems
unless the online Boolean matrix-vector multiplication conjecture is false. We
also provide nearly matching upper bounds for most of the problems we consider.Comment: Accepted at STACS'1
Technology Mapping for Circuit Optimization Using Content-Addressable Memory
The growing complexity of Field Programmable Gate Arrays (FPGA's) is leading to architectures with high input cardinality look-up tables (LUT's). This thesis describes a methodology for area-minimizing technology mapping for combinational logic, specifically designed for such FPGA architectures. This methodology, called LURU, leverages the parallel search capabilities of Content-Addressable Memories (CAM's) to outperform traditional mapping algorithms in both execution time and quality of results. The LURU algorithm is fundamentally different from other techniques for technology mapping in that LURU uses textual string representations of circuit topology in order to efficiently store and search for circuit patterns in a CAM. A circuit is mapped to the target LUT technology using both exact and inexact string matching techniques. Common subcircuit expressions (CSE's) are also identified and used for architectural optimization---a small set of CSE's is shown to effectively cover an average of 96% of the test circuits. LURU was tested with the ISCAS'85 suite of combinational benchmark circuits and compared with the mapping algorithms FlowMap and CutMap. The area reduction shown by LURU is, on average, 20% better compared to FlowMap and CutMap. The asymptotic runtime complexity of LURU is shown to be better than that of both FlowMap and CutMap
Reverse-Safe Data Structures for Text Indexing
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n Ï log d) time, where Ï is the matrix multiplication exponent. We show that, despite the n Ï factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model
Cursive script recognition using wildcards and multiple experts
Variability in handwriting styles suggests that many letter recognition engines cannot correctly identify some hand-written letters of poor quality at reasonable computational cost. Methods that are capable of searching the resulting sparse graph of letter candidates are therefore required. The method presented here employs âwildcardsâ to represent missing letter candidates. Multiple experts are used to represent different aspects of handwriting. Each expert evaluates closeness of match and indicates its confidence. Explanation experts determine the degree to which the word alternative under consideration explains extraneous letter candidates. Schemata for normalisation and combination of scores are investigated and their performance compared. Hill climbing yields near-optimal combination weights that outperform comparable methods on identical dynamic handwriting data
Constraint-based sequence mining using constraint programming
The goal of constraint-based sequence mining is to find sequences of symbols
that are included in a large number of input sequences and that satisfy some
constraints specified by the user. Many constraints have been proposed in the
literature, but a general framework is still missing. We investigate the use of
constraint programming as general framework for this task. We first identify
four categories of constraints that are applicable to sequence mining. We then
propose two constraint programming formulations. The first formulation
introduces a new global constraint called exists-embedding. This formulation is
the most efficient but does not support one type of constraint. To support such
constraints, we develop a second formulation that is more general but incurs
more overhead. Both formulations can use the projected database technique used
in specialised algorithms. Experiments demonstrate the flexibility towards
constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming
(CPAIOR), 201
- âŠ