40 research outputs found
Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching
Let S and S\u27 be two strings of the same length.We consider the following two variants of string matching.
* Parameterized Matching: The characters of S and S\u27 are partitioned into static characters and parameterized characters.
The strings are parameterized match iff the static characters match exactly and there exists a one-to-one function which renames the parameterized characters in S to those in S\u27.
* Order-Preserving Matching: The strings are order-preserving match iff for any two integers i,j in [1,|S|], S[i] <= S[j] iff S\u27[i] <= S\u27[j].
Let P be a collection of d patterns {P_1, P_2, ..., P_d} of total length n characters, which are chosen from an alphabet Sigma.
Given a text T, also over Sigma, we consider the dictionary indexing problem under the above definitions of string matching.
Specifically, the task is to index P, such that we can report all positions j where at least one of the patterns P_i in P is a parameterized-match (resp. order-preserving match) with the same-length substring of starting at j. Previous best-known indexes occupy O(n * log(n)) bits and can report all occ positions in O(|T| * log(|Sigma|) + occ) time. We present space-efficient indexes that occupy O(n * log(|Sigma|+d) * log(n)) bits and reports all occ positions in O(|T| * (log(|Sigma|) + log_{|Sigma|}(n)) + occ) time for parameterized matching and in O(|T| * log(n) + occ) time for order-preserving matching
Efficiently Correcting Matrix Products
We study the problem of efficiently correcting an erroneous product of two
matrices over a ring. Among other things, we provide a randomized
algorithm for correcting a matrix product with at most erroneous entries
running in time and a deterministic -time
algorithm for this problem (where the notation suppresses
polylogarithmic terms in and ).Comment: Fixed invalid reference to figure in v
Weighted dynamic finger in binary search trees
It is shown that the online binary search tree data structure GreedyASS
performs asymptotically as well on a sufficiently long sequence of searches as
any static binary search tree where each search begins from the previous search
(rather than the root). This bound is known to be equivalent to assigning each
item in the search tree a positive weight and bounding the search
cost of an item in the search sequence by
amortized. This result is the strongest finger-type bound to be proven for
binary search trees. By setting the weights to be equal, one observes that our
bound implies the dynamic finger bound. Compared to the previous proof of the
dynamic finger bound for Splay trees, our result is significantly shorter,
stronger, simpler, and has reasonable constants.Comment: An earlier version of this work appeared in the Proceedings of the
Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithm
Online Graph Coloring with Predictions
We introduce learning augmented algorithms to the online graph coloring
problem. Although the simple greedy algorithm FirstFit is known to perform
poorly in the worst case, we are able to establish a relationship between the
structure of any input graph that is revealed online and the number of
colors that FirstFit uses for . Based on this relationship, we propose an
online coloring algorithm FirstFitPredictions that extends FirstFit while
making use of machine learned predictions. We show that FirstFitPredictions is
both \emph{consistent} and \emph{smooth}. Moreover, we develop a novel
framework for combining online algorithms at runtime specifically for the
online graph coloring problem. Finally, we show how this framework can be used
to robustify by combining it with any classical online coloring algorithm (that
disregards the predictions)
Forbidden Extension Queries
Document retrieval is one of the most fundamental problem in information retrieval. The objective is to retrieve all documents from a document collection that are relevant to an input pattern.
Several variations of this problem such as ranked document retrieval, document listing with two patterns and forbidden patterns have been studied. We introduce the problem of document retrieval with forbidden extensions.
Let D={T_1,T_2,...,T_D} be a collection of D string documents of n characters in total, and P^+ and P^- be two query patterns, where P^+ is a proper prefix of P^-. We call P^- as the forbidden extension of the included pattern P^+. A forbidden extension query asks to report all occ documents in D that contains P^+ as a substring, but does not contain P^- as one. A top-k forbidden extension query asks to report those k documents among the occ documents that are most relevant to P^+. We present a linear index (in words) with an O(|P^-| + occ) query time for the document listing problem. For the top-k version of the problem, we achieve the following results, when the relevance of a document is based on PageRank:
- an O(n) space (in words) index with O(|P^-|log sigma+ k) query time, where sigma is the size of the alphabet from which characters in D are chosen. For constant alphabets, this yields an optimal query time of O(|P^-|+ k).
- for any constant epsilon > 0, a |CSA| + |CSA^*| + Dlog frac{n}{D} + O(n) bits index with O(search(P)+ k cdot tsa cdot log ^{2+epsilon} n) query time, where search(P) is the time to find the suffix range of a pattern P, tsa is the time to find suffix (or inverse suffix) array value, and |CSA^*| denotes the maximum of the space needed to store the compressed suffix array CSA of the concatenated text of all documents, or the total space needed to store the individual CSA of each document
Fully Dynamic MIS in Uniformly Sparse Graphs
We consider the problem of maintaining a maximal independent set (MIS) in a dynamic graph subject to edge insertions and deletions. Recently, Assadi, Onak, Schieber and Solomon (STOC 2018) showed that an MIS can be maintained in sublinear (in the dynamically changing number of edges) amortized update time. In this paper we significantly improve the update time for uniformly sparse graphs. Specifically, for graphs with arboricity alpha, the amortized update time of our algorithm is O(alpha^2 * log^2 n), where n is the number of vertices. For low arboricity graphs, which include, for example, minor-free graphs as well as some classes of "real world" graphs, our update time is polylogarithmic. Our update time improves the result of Assadi et al. for all graphs with arboricity bounded by m^{3/8 - epsilon}, for any constant epsilon > 0. This covers much of the range of possible values for arboricity, as the arboricity of a general graph cannot exceed m^{1/2}
Algorithms for the Minimum Dominating Set Problem in Bounded Arboricity Graphs: Simpler, Faster, and Combinatorial
We revisit the minimum dominating set problem on graphs with arboricity
bounded by . In the (standard) centralized setting, Bansal and Umboh
[BU17] gave an -approximation LP rounding algorithm. Moreover,
[BU17] showed that it is NP-hard to achieve an asymptotic improvement. On the
other hand, the previous two non-LP-based algorithms, by Lenzen and Wattenhofer
[LW10], and Jones et al. [JLR+13], achieve an approximation factor of
in linear time.
There is a similar situation in the distributed setting: While there are
-round LP-based -approximation algorithms [KMW06,
DKM19], the best non-LP-based algorithm by Lenzen and Wattenhofer [LW10] is an
implementation of their centralized algorithm, providing an
-approximation within rounds with high probability.
We address the question of whether one can achieve a simple, elementary
-approximation algorithm not based on any LP-based methods, either
in the centralized setting or in the distributed setting. We resolve these
questions in the affirmative. More specifically, our contribution is two-fold:
1. In the centralized setting, we provide a surprisingly simple combinatorial
algorithm that is asymptotically optimal in terms of both approximation factor
and running time: an -approximation in linear time.
2. Based on our centralized algorithm, we design a distributed combinatorial
-approximation algorithm in the model that runs
in rounds with high probability. Our round complexity
outperforms the best LP-based distributed algorithm for a wide range of
parameters