1,725 research outputs found

    Deterministic Sparse Pattern Matching via the Baur-Strassen Theorem

    Full text link
    How fast can you test whether a constellation of stars appears in the night sky? This question can be modeled as the computational problem of testing whether a set of points PP can be moved into (or close to) another set QQ under some prescribed group of transformations. Consider, as a simple representative, the following problem: Given two sets of at most nn integers P,Q[N]P,Q\subseteq[N], determine whether there is some shift ss such that PP shifted by ss is a subset of QQ, i.e., P+s={p+s:pP}QP+s=\{p+s:p\in P\}\subseteq Q. This problem, to which we refer as the Constellation problem, can be solved in near-linear time O(nlogn)O(n\log n) by a Monte Carlo randomized algorithm [Cardoze, Schulman; FOCS'98] and time O(nlog2N)O(n\log^2 N) by a Las Vegas randomized algorithm [Cole, Hariharan; STOC'02]. Moreover, there is a deterministic algorithm running in time n2O(lognloglogN)n\cdot2^{O(\sqrt{\log n\log\log N})} [Chan, Lewenstein; STOC'15]. An interesting question left open by these previous works is whether Constellation is in deterministic near-linear time (i.e., with only polylogarithmic overhead). We answer this question positively by giving an n(logN)O(1)n\cdot(\log N)^{O(1)}-time deterministic algorithm for the Constellation problem. Our algorithm extends to various more complex Point Pattern Matching problems in higher dimensions, under translations and rigid motions, and possibly with mismatches, and also to a near-linear-time derandomization of the Sparse Wildcard Matching problem on strings. We find it particularly interesting how we obtain our deterministic algorithm. All previous algorithms are based on the same baseline idea, using additive hashing and the Fast Fourier Transform. In contrast, our algorithms are based on new ideas, involving a surprising blend of combinatorial and algebraic techniques. At the heart lies an innovative application of the Baur-Strassen theorem from algebraic complexity theory.Comment: Abstract shortened to fit arxiv requirement

    Algorithms for sparse convolution and sublinear edit distance

    Get PDF
    In this PhD thesis on fine-grained algorithm design and complexity, we investigate output-sensitive and sublinear-time algorithms for two important problems. (1) Sparse Convolution: Computing the convolution of two vectors is a basic algorithmic primitive with applications across all of Computer Science and Engineering. In the sparse convolution problem we assume that the input and output vectors have at most t nonzero entries, and the goal is to design algorithms with running times dependent on t. For the special case where all entries are nonnegative, which is particularly important for algorithm design, it is known since twenty years that sparse convolutions can be computed in near-linear randomized time O(t log^2 n). In this thesis we develop a randomized algorithm with running time O(t \log t) which is optimal (under some mild assumptions), and the first near-linear deterministic algorithm for sparse nonnegative convolution. We also present an application of these results, leading to seemingly unrelated fine-grained lower bounds against distance oracles in graphs. (2) Sublinear Edit Distance: The edit distance of two strings is a well-studied similarity measure with numerous applications in computational biology. While computing the edit distance exactly provably requires quadratic time, a long line of research has lead to a constant-factor approximation algorithm in almost-linear time. Perhaps surprisingly, it is also possible to approximate the edit distance k within a large factor O(k) in sublinear time O~(n/k + poly(k)). We drastically improve the approximation factor of the known sublinear algorithms from O(k) to k^{o(1)} while preserving the O(n/k + poly(k)) running time.In dieser Doktorarbeit über feinkörnige Algorithmen und Komplexität untersuchen wir ausgabesensitive Algorithmen und Algorithmen mit sublinearer Lauf-zeit für zwei wichtige Probleme. (1) Dünne Faltungen: Die Berechnung der Faltung zweier Vektoren ist ein grundlegendes algorithmisches Primitiv, das in allen Bereichen der Informatik und des Ingenieurwesens Anwendung findet. Für das dünne Faltungsproblem nehmen wir an, dass die Eingabe- und Ausgabevektoren höchstens t Einträge ungleich Null haben, und das Ziel ist, Algorithmen mit Laufzeiten in Abhängigkeit von t zu entwickeln. Für den speziellen Fall, dass alle Einträge nicht-negativ sind, was insbesondere für den Entwurf von Algorithmen relevant ist, ist seit zwanzig Jahren bekannt, dass dünn besetzte Faltungen in nahezu linearer randomisierter Zeit O(t \log^2 n) berechnet werden können. In dieser Arbeit entwickeln wir einen randomisierten Algorithmus mit Laufzeit O(t \log t), der (unter milden Annahmen) optimal ist, und den ersten nahezu linearen deterministischen Algorithmus für dünne nichtnegative Faltungen. Wir stellen auch eine Anwendung dieser Ergebnisse vor, die zu scheinbar unverwandten feinkörnigen unteren Schranken gegen Distanzorakel in Graphen führt. (2) Sublineare Editierdistanz: Die Editierdistanz zweier Zeichenketten ist ein gut untersuchtes Ähnlichkeitsmaß mit zahlreichen Anwendungen in der Computerbiologie. Während die exakte Berechnung der Editierdistanz nachweislich quadratische Zeit erfordert, hat eine lange Reihe von Forschungsarbeiten zu einem Approximationsalgorithmus mit konstantem Faktor in fast-linearer Zeit geführt. Überraschenderweise ist es auch möglich, die Editierdistanz k innerhalb eines großen Faktors O(k) in sublinearer Zeit O~(n/k + poly(k)) zu approximieren. Wir verbessern drastisch den Approximationsfaktor der bekannten sublinearen Algorithmen von O(k) auf k^{o(1)} unter Beibehaltung der O(n/k + poly(k))-Laufzeit

    Sparse Nonnegative Convolution Is Equivalent to Dense Nonnegative Convolution

    Get PDF
    Computing the convolution ABA\star B of two length-nn vectors A,BA,B is an ubiquitous computational primitive. Applications range from string problems to Knapsack-type problems, and from 3SUM to All-Pairs Shortest Paths. These applications often come in the form of nonnegative convolution, where the entries of A,BA,B are nonnegative integers. The classical algorithm to compute ABA\star B uses the Fast Fourier Transform and runs in time O(nlogn)O(n\log n). However, often AA and BB satisfy sparsity conditions, and hence one could hope for significant improvements. The ideal goal is an O(klogk)O(k\log k)-time algorithm, where kk is the number of non-zero elements in the output, i.e., the size of the support of ABA\star B. This problem is referred to as sparse nonnegative convolution, and has received considerable attention in the literature; the fastest algorithms to date run in time O(klog2n)O(k\log^2 n). The main result of this paper is the first O(klogk)O(k\log k)-time algorithm for sparse nonnegative convolution. Our algorithm is randomized and assumes that the length nn and the largest entry of AA and BB are subexponential in kk. Surprisingly, we can phrase our algorithm as a reduction from the sparse case to the dense case of nonnegative convolution, showing that, under some mild assumptions, sparse nonnegative convolution is equivalent to dense nonnegative convolution for constant-error randomized algorithms. Specifically, if D(n)D(n) is the time to convolve two nonnegative length-nn vectors with success probability 2/32/3, and S(k)S(k) is the time to convolve two nonnegative vectors with output size kk with success probability 2/32/3, then S(k)=O(D(k)+k(loglogk)2)S(k)=O(D(k)+k(\log\log k)^2). Our approach uses a variety of new techniques in combination with some old machinery from linear sketching and structured linear algebra, as well as new insights on linear hashing, the most classical hash function
    corecore