Search CORE

4 research outputs found

On the average-case complexity of pattern matching with wildcards

Author: Carl Barton
Publication venue: 'Elsevier BV'
Publication date: 12/04/2022
Field of study

Pattern matching with wildcards is a string matching problem with the goal of finding all factors of a text

t

of length

n

that match a pattern

x

of length

m

, where wildcards (characters that match everything) may be present. In this paper we present a number of complexity results and fast average-case algorithms for pattern matching where wildcards are allowed in the pattern, however, the results are easily adapted to the case where wildcards are allowed in the text as well. We analyse the \textit{average-case} complexity of these algorithms and derive non-trivial time bounds. These are the first results on the average-case complexity of pattern matching with wildcards which provide a provable separation in time complexity between exact pattern matching and pattern matching with wildcards. We introduce the \textit{wc-period} of a string which is the period of the binary mask

x_b

where

x_b[i]=a

\textit{iff}

x[i]\neq \phi

and

b

otherwise. We denote the length of the wc-period of a string

x

by \textsc{wcp}(x). We show the following results for constant

0< \epsilon < 1

and a pattern

x

of length

m

and

g

wildcards with \textsc{wcp}(x)=p the prefix of length

p

contains

g_p

wildcards: \begin{itemize} \item If

\displaystyle\lim_{m \rightarrow \infty} \frac{g_p}{p}=0

there is an optimal algorithm running in \cO(\frac{n \log_\sigma m}{m})-time on average. \item If

\displaystyle\lim_{m \rightarrow \infty} \frac{g_p}{p}=1-\epsilon

there is an algorithm running in \cO(\frac{n \log_\sigma m\log_2 p}{m})-time on average. \item If

\displaystyle\lim_{m \rightarrow \infty} \frac{g}{m} = \displaystyle\lim_{m \rightarrow \infty} 1-f(m)=1

any algorithm takes at least

\Omega(\frac{n \log_\sigma m}{f(m)})

-time on average. \end{itemize

Birkbeck Institutional Research Online

Approximating Properties of Data Streams

Author: Zhou Samson
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

In this dissertation, we present algorithms that approximate properties in the data stream model, where elements of an underlying data set arrive sequentially, but algorithms must use space sublinear in the size of the underlying data set. We first study the problem of finding all k-periods of a length-n string S, presented as a data stream. S is said to have k-period p if its prefix of length n − p differs from its suffix of length n − p in at most k locations. We give algorithms to compute the k-periods of a string S using poly(k, log n) bits of space and we complement these results with comparable lower bounds. We then study the problem of identifying a longest substring of strings S and T of length n that forms a d-near-alignment under the edit distance, in the simultaneous streaming model. In this model, symbols of strings S and T are streamed at the same time and form a d-near-alignment if the distance between them in some given metric is at most d. We give several algorithms, including an exact one-pass algorithm that uses O(d2 + d log n) bits of space. We then consider the distinct elements and `p-heavy hitters problems in the sliding window model, where only the most recent n elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and `p-heavy hitters that is nearly optimal in both n and c. Finally, we consider the problem of estimating the maximum weighted matching of a graph whose edges are revealed in a streaming fashion. We develop a reduction from the maximum weighted matching problem to the maximum cardinality matching problem that only doubles the approximation factor of a streaming algorithm developed for the maximum cardinality matching problem. As an application, we obtain an estimator for the weight of a maximum weighted matching in bounded-arboricity graphs and in particular, a (48 + )-approximation estimator for the weight of a maximum weighted matching in planar graphs

Purdue E-Pubs

Periodicity in Data Streams with Wildcards

Author: Danny Hermelin
DE Knuth
Elena Grigorescu
Erfan Sadeqi Azer
F Blanchet-Sadri
F Ergu̇n
F Manea
Funda Ergün
J Misra
Michael S. Crouch
O Lachish
P Clifford
P Gawrychowski
R Clifford
Raphael Clifford
RM Karp
S Muthukrishnan
Samson Zhou
SF Altschul
Z Galil
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref