1,202 research outputs found
Fast and Powerful Hashing using Tabulation
Randomized algorithms are often enjoyed for their simplicity, but the hash
functions employed to yield the desired probabilistic guarantees are often too
complicated to be practical. Here we survey recent results on how simple
hashing schemes based on tabulation provide unexpectedly strong guarantees.
Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as
consisting of characters and we have precomputed character tables
mapping characters to random hash values. A key
is hashed to . This schemes is
very fast with character tables in cache. While simple tabulation is not even
4-independent, it does provide many of the guarantees that are normally
obtained via higher independence, e.g., linear probing and Cuckoo hashing.
Next we consider twisted tabulation where one input character is "twisted" in
a simple way. The resulting hash function has powerful distributional
properties: Chernoff-Hoeffding type tail bounds and a very small bias for
min-wise hashing. This also yields an extremely fast pseudo-random number
generator that is provably good for many classic randomized algorithms and
data-structures.
Finally, we consider double tabulation where we compose two simple tabulation
functions, applying one to the output of the other, and show that this yields
very high independence in the classic framework of Carter and Wegman [1977]. In
fact, w.h.p., for a given set of size proportional to that of the space
consumed, double tabulation gives fully-random hashing. We also mention some
more elaborate tabulation schemes getting near-optimal independence for given
time and space.
While these tabulation schemes are all easy to implement and use, their
analysis is not
Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time
Edit distance is a measure of similarity of two strings based on the minimum
number of character insertions, deletions, and substitutions required to
transform one string into the other. The edit distance can be computed exactly
using a dynamic programming algorithm that runs in quadratic time. Andoni,
Krauthgamer and Onak (2010) gave a nearly linear time algorithm that
approximates edit distance within approximation factor .
In this paper, we provide an algorithm with running time
that approximates the edit distance within a constant
factor
Parallel Sparse Matrix-Matrix Multiplication
The thesis investigates the BLAS-3 routine of sparse matrix-matrix multiplication (SpGEMM) based on the outer product method. Sev- eral algorithmic approaches have been implemented and empirically an- alyzed. The experiments have shown that an algorithm presented by Gustavson [22] outperforms other alternatives. In this work we propose optimization techniques that improve the scalability and the cache efficiency of the Gustavson’s algorithm for large matrices. Our approach succeeded to reduce the cache misses by more than a factor of five and to improve the net running time by 30% with some instances. The thesis also presents an algorithm for flops estima- tion, which can be used to determine an upper bound for the density of the result matrix. Furthermore, the work analyzes and empirically evaluates techniques for parallelization of the multiplication in a shared memory model by using Intel TBB and OpenMP. We investigate the cache efficiency of the algorithm in a parallel setting and compare several approaches for load balancing of the computation
Hardness of Easy Problems: Basing Hardness on Popular Conjectures such as the Strong Exponential Time Hypothesis (Invited Talk)
Algorithmic research strives to develop fast algorithms for fundamental problems. Despite its many successes, however, many problems still do not have very efficient algorithms. For years researchers have explained the hardness for key problems by proving NP-hardness, utilizing polynomial time reductions to base the hardness of key problems on the famous conjecture P != NP. For problems that already have polynomial time algorithms, however, it does not seem that one can show any sort of hardness based on P != NP. Nevertheless, we would like to provide evidence that a problem with a running time O(n^k) that has not been improved in decades, also requires n^{k-o(1)} time, thus explaining the lack of progress on the problem. Such unconditional time lower bounds seem very difficult to obtain, unfortunately. Recent work has concentrated on an approach mimicking NP-hardness: (1) select a few key problems that are conjectured to require T(n) time to solve, (2) use special, fine-grained reductions to prove time lower bounds for many diverse problems in P based on the conjectured hardness of the key problems. In this abstract we outline the approach, give some examples of hardness results based on the Strong Exponential Time Hypothesis, and present an overview of some of the recent work on the topic
Computing longest common square subsequences
A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings
Hardness of Approximation of (Multi-)LCS over Small Alphabet
The problem of finding longest common subsequence (LCS) is one of the fundamental problems in computer science, which finds application in fields such as computational biology, text processing, information retrieval, data compression etc. It is well known that (decision version of) the problem of finding the length of a LCS of an arbitrary number of input sequences (which we refer to as Multi-LCS problem) is NP-complete. Jiang and Li [SICOMP\u2795] showed that if Max-Clique is hard to approximate within a factor of s then Multi-LCS is also hard to approximate within a factor of ?(s). By the NP-hardness of the problem of approximating Max-Clique by Zuckerman [ToC\u2707], for any constant ? > 0, the length of a LCS of arbitrary number of input sequences of length n each, cannot be approximated within an n^{1-?}-factor in polynomial time unless {P}={NP}. However, the reduction of Jiang and Li assumes the alphabet size to be ?(n). So far no hardness result is known for the problem of approximating Multi-LCS over sub-linear sized alphabet. On the other hand, it is easy to get 1/|?|-factor approximation for strings of alphabet ?.
In this paper, we make a significant progress towards proving hardness of approximation over small alphabet by showing a polynomial-time reduction from the well-studied densest k-subgraph problem with perfect completeness to approximating Multi-LCS over alphabet of size poly(n/k). As a consequence, from the known hardness result of densest k-subgraph problem (e.g. [Manurangsi, STOC\u2717]) we get that no polynomial-time algorithm can give an n^{-o(1)}-factor approximation of Multi-LCS over an alphabet of size n^{o(1)}, unless the Exponential Time Hypothesis is false
A Faster Subquadratic Algorithm for the Longest Common Increasing Subsequence Problem
The Longest Common Increasing Subsequence (LCIS) is a variant of the
classical Longest Common Subsequence (LCS), in which we additionally require
the common subsequence to be strictly increasing. While the well-known "Four
Russians" technique can be used to find LCS in subquadratic time, it does not
seem applicable to LCIS. Recently, Duraj [STACS 2020] used a completely
different method based on the combinatorial properties of LCIS to design an
time algorithm. We show that an
approach based on exploiting tabulation can be used to construct an
asymptotically faster time
algorithm. As our solution avoids using the specific combinatorial properties
of LCIS, it can be also adapted for the Longest Common Weakly Increasing
Subsequence (LCWIS)
Compiling a domain specific language for dynamic programming
Steffen P. Compiling a domain specific language for dynamic programming. Bielefeld (Germany): Bielefeld University; 2006
- …