Search CORE

1,202 research outputs found

Fast and Powerful Hashing using Tabulation

Author: Thorup Mikkel
Publication venue
Publication date: 01/01/2016
Field of study

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of

c

characters and we have precomputed character tables

h_1,...,h_c

mapping characters to random hash values. A key

x=(x_1,...,x_c)

is hashed to

h_1[x_1] \oplus h_2[x_2].....\oplus h_c[x_c]

. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. While these tabulation schemes are all easy to implement and use, their analysis is not

arXiv.org e-Print Archive

Copenhagen University Research Information System

Dagstuhl Research Online Publication Server

Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

Author: Chakraborty Diptarka
Das Debarati
Goldenberg Elazar
Koucky Michal
Saks Michael
Publication venue
Publication date: 08/10/2018
Field of study

Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within approximation factor

\text{poly}(\log n)

. In this paper, we provide an algorithm with running time

\tilde{O}(n^{2-2/7})

that approximates the edit distance within a constant factor

arXiv.org e-Print Archive

Copenhagen University Research Information System

Parallel Sparse Matrix-Matrix Multiplication

Author: Alexandrov Luben
Publication venue: Karlsruher Institut für Technologie
Publication date: 25/01/2021
Field of study

The thesis investigates the BLAS-3 routine of sparse matrix-matrix multiplication (SpGEMM) based on the outer product method. Sev- eral algorithmic approaches have been implemented and empirically an- alyzed. The experiments have shown that an algorithm presented by Gustavson [22] outperforms other alternatives. In this work we propose optimization techniques that improve the scalability and the cache efficiency of the Gustavson’s algorithm for large matrices. Our approach succeeded to reduce the cache misses by more than a factor of five and to improve the net running time by 30% with some instances. The thesis also presents an algorithm for flops estima- tion, which can be used to determine an upper bound for the density of the result matrix. Furthermore, the work analyzes and empirically evaluates techniques for parallelization of the multiplication in a shared memory model by using Intel TBB and OpenMP. We investigate the cache efficiency of the algorithm in a parallel setting and compare several approaches for load balancing of the computation

Hardness of Easy Problems: Basing Hardness on Popular Conjectures such as the Strong Exponential Time Hypothesis (Invited Talk)

Author: Vassilevska Williams Virginia
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 10th International Symposium on Parameterized and Exact Computation (IPEC 2015)
Publication date: 01/01/2015
Field of study

Algorithmic research strives to develop fast algorithms for fundamental problems. Despite its many successes, however, many problems still do not have very efficient algorithms. For years researchers have explained the hardness for key problems by proving NP-hardness, utilizing polynomial time reductions to base the hardness of key problems on the famous conjecture P != NP. For problems that already have polynomial time algorithms, however, it does not seem that one can show any sort of hardness based on P != NP. Nevertheless, we would like to provide evidence that a problem

A

with a running time O(n^k) that has not been improved in decades, also requires n^{k-o(1)} time, thus explaining the lack of progress on the problem. Such unconditional time lower bounds seem very difficult to obtain, unfortunately. Recent work has concentrated on an approach mimicking NP-hardness: (1) select a few key problems that are conjectured to require T(n) time to solve, (2) use special, fine-grained reductions to prove time lower bounds for many diverse problems in P based on the conjectured hardness of the key problems. In this abstract we outline the approach, give some examples of hardness results based on the Strong Exponential Time Hypothesis, and present an overview of some of the recent work on the topic

Dagstuhl Research Online Publication Server

Computing longest common square subsequences

Author: Bannai Hideo
Inenaga Shunsuke
Inoue Takafumi
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Annual Symposium on Combinatorial Pattern Matching (CPM 2018)
Publication date: 01/01/2018
Field of study

A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings

Dagstuhl Research Online Publication Server

Hardness of Approximation of (Multi-)LCS over Small Alphabet

Author: Bhangale Amey
Chakraborty Diptarka
Kumar Rajendra
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Publication date: 01/01/2020
Field of study

The problem of finding longest common subsequence (LCS) is one of the fundamental problems in computer science, which finds application in fields such as computational biology, text processing, information retrieval, data compression etc. It is well known that (decision version of) the problem of finding the length of a LCS of an arbitrary number of input sequences (which we refer to as Multi-LCS problem) is NP-complete. Jiang and Li [SICOMP\u2795] showed that if Max-Clique is hard to approximate within a factor of s then Multi-LCS is also hard to approximate within a factor of ?(s). By the NP-hardness of the problem of approximating Max-Clique by Zuckerman [ToC\u2707], for any constant ? > 0, the length of a LCS of arbitrary number of input sequences of length n each, cannot be approximated within an n^{1-?}-factor in polynomial time unless {P}={NP}. However, the reduction of Jiang and Li assumes the alphabet size to be ?(n). So far no hardness result is known for the problem of approximating Multi-LCS over sub-linear sized alphabet. On the other hand, it is easy to get 1/|?|-factor approximation for strings of alphabet ?. In this paper, we make a significant progress towards proving hardness of approximation over small alphabet by showing a polynomial-time reduction from the well-studied densest k-subgraph problem with perfect completeness to approximating Multi-LCS over alphabet of size poly(n/k). As a consequence, from the known hardness result of densest k-subgraph problem (e.g. [Manurangsi, STOC\u2717]) we get that no polynomial-time algorithm can give an n^{-o(1)}-factor approximation of Multi-LCS over an alphabet of size n^{o(1)}, unless the Exponential Time Hypothesis is false

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

eScholarship - University of California

A Faster Subquadratic Algorithm for the Longest Common Increasing Subsequence Problem

Author: Agrawal Anadi
Gawrychowski Pawe?
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Algorithms and Computation (ISAAC 2020)
Publication date: 01/01/2020
Field of study

The Longest Common Increasing Subsequence (LCIS) is a variant of the classical Longest Common Subsequence (LCS), in which we additionally require the common subsequence to be strictly increasing. While the well-known "Four Russians" technique can be used to find LCS in subquadratic time, it does not seem applicable to LCIS. Recently, Duraj [STACS 2020] used a completely different method based on the combinatorial properties of LCIS to design an

\mathcal{O}(n^2(\log\log n)^2/\log^{1/6}n)

time algorithm. We show that an approach based on exploiting tabulation can be used to construct an asymptotically faster

\mathcal{O}(n^2 \log\log n/\sqrt{\log n})

time algorithm. As our solution avoids using the specific combinatorial properties of LCIS, it can be also adapted for the Longest Common Weakly Increasing Subsequence (LCWIS)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Compiling a domain specific language for dynamic programming

Author: Steffen Peter
Publication venue: Bielefeld University
Publication date: 01/01/2006
Field of study

Steffen P. Compiling a domain specific language for dynamic programming. Bielefeld (Germany): Bielefeld University; 2006

Publications at Bielefeld University