Search CORE

29 research outputs found

Stronger Tradeoffs for Orthogonal Range Querying in the Semigroup Model

Author: N. Prabhakar Swaroop
Sharma Vikram
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2018)
Publication date: 01/01/2018
Field of study

In this paper, we focus on lower bounds for data structures supporting orthogonal range querying on m points in n-dimensions in the semigroup model. Such a data structure usually maintains a family of "canonical subsets" of the given set of points and on a range query, it outputs a disjoint union of the appropriate subsets. Fredman showed that in order to prove lower bounds in the semigroup model, it suffices to prove a lower bound on a certain combinatorial tradeoff between two parameters: (a) the total sizes of the canonical subsets, and (b) the total number of canonical subsets required to cover all query ranges. In particular, he showed that the arithmetic mean of these two parameters is Omega(m log^n m). We strengthen this tradeoff by showing that the geometric mean of the same two parameters is Omega(m log^n m). Our second result is an alternate proof of Fredman\u27s tradeoff in the one dimensional setting. The problem of answering range queries using canonical subsets can be formulated as factoring a specific boolean matrix as a product of two boolean matrices, one representing the canonical sets and the other capturing the appropriate disjoint unions of the former to output all possible range queries. In this formulation, we can ask what is an optimal data structure, i.e., a data structure that minimizes the sum of the two parameters mentioned above, and how does the balanced binary search tree compare with this optimal data structure in the two parameters? The problem of finding an optimal data structure is a non-linear optimization problem. In one dimension, Fredman\u27s result implies that the minimum value of the objective function is Omega(m log m), which means that at least one of the parameters has to be Omega(m log m). We show that both the parameters in an optimal solution have to be Omega(m log m). This implies that balanced binary search trees are near optimal data structures for range querying in one dimension. We derive intermediate results on factoring matrices, not necessarily boolean, while trying to minimize the norms of the factors, that may be of independent interest

Dagstuhl Research Online Publication Server

In-Memory Storage for Labeled Tree-Structured Data

Author: Zhou Gelin
Publication venue: 'University of Waterloo'
Publication date: 01/03/2017
Field of study

In this thesis, we design in-memory data structures for labeled and weights trees, so that various types of path queries or operations can be supported with efficient query time. We assume the word RAM model with word size w, which permits random accesses to w-bit memory cells. Our data structures are space-efficient and many of them are even succinct. These succinct data structures occupy space close to the information theoretic lower bounds of the input trees within lower order terms. First, we study the problems of supporting various path queries over weighted trees. A path counting query asks for the number of nodes on a query path whose weights lie within a query range, while a path reporting query requires to report these nodes. A path median query asks for the median weight on a path between two given nodes, and a path selection query returns the k-th smallest weight. We design succinct data structures to support path counting queries in O(lg σ/ lg lg n + 1) time, path reporting queries in O((occ + 1)(lg σ/ lg lg n + 1)) time, and path median and path selection queries in O(lg σ/ lg lg σ) time, where n is the size of the input tree, the weights of nodes are drawn from [1..σ] and occ is the size of the output. Our results not only greatly improve the best known data structures [31, 75, 65], but also match the lower bounds for path counting, median and selection queries [86, 87, 71] when σ = Ω(n/polylog(n)). Second, we study the problem of representing labeled ordinal trees succinctly. Our new representations support a much broader collection of operations than previous work. In our approach, labels of nodes are stored in a preorder label sequence, which can be compressed using any succinct representation of strings that supports access, rank and select operations. Thus, we present a framework for succinct representations of labeled ordinal trees that is able to handle large alphabets. This answers an open problem presented by Geary et al. [54], which asks for representations of labeled ordinal trees that remain space-efficient for large alphabets. We further extend our work and present the first succinct representations for dynamic labeled ordinal trees that support several label-based operations including finding the level ancestor with a given label. Third, we study the problems of supporting path minimum and semigroup path sum queries. In the path minimum problem, we preprocess a tree on n weighted nodes, such that given an arbitrary path, the node with the smallest weight along this path can be located. We design novel succinct indices for this problem under the indexing model, for which weights of nodes are read-only and can be accessed with ranks of nodes in the preorder traversal sequence of the input tree. One of our index structures supports queries in O(α(m,n)) time, and occupies O(m) bits of space in addition to the space required for the input tree, where m is an integer greater than or equal to n and α(m, n) is the inverse-Ackermann function. Following the same approach, we also develop succinct data structures for semigroup path sum queries, for which a query asks for the sum of weights along a given query path. Then, using the succinct indices for path minimum queries, we achieve three different time-space tradeoffs for path reporting queries. Finally, we study the problems of supporting various path queries in dynamic settings. We propose the first non-trivial linear-space solution that supports path reporting in O((lgn/lglgn)^2 +occlgn/lglgn)) query time, where n is the size of the input tree and occ is the output size, and the insertion and deletion of a node of an arbitrary degree in O(lg^{2+ε} n) amortized time, for any constant ε ∈ (0, 1). Obvious solutions based on directly dynamizing solutions to the static version of this problem all require Ω((lg n/ lg lg n)^2) time for each node reported. We also design data structures that support path counting and path reporting queries in O((lg n/ lg lg n)^2) time, and insertions and deletions in O((lg n/ lg lg n)^2) amortized time. This matches the best known results for dynamic two-dimensional range counting [62] and range selection [63], which can be viewed as special cases of path counting and path selection

University of Waterloo's Institutional Repository

Data-independent space partitionings for summaries

Author: Cormode Graham
Garofalakis Minos
Shekelyan Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/06/2021
Field of study

Histograms are a standard tool in data management for describing multidimensional data. It is often convenient or even necessary to define data independent histograms, to partition space in advance without observing the data itself. Specific motivations arise in managing data when it is not suitable to frequently change the boundaries between histogram cells. For example, when the data is subject to many insertions and deletions; when data is distributed across multiple systems; or when producing a privacy-preserving representation of the data. The baseline approach is to consider an equiwidth histogram, i.e., a regular grid over the space. However, this is not optimal for the objective of splitting the multidimensional space into (possibly overlapping) bins, such that each box can be rebuilt using a set of non-overlapping bins with minimal excess (or deficit) of volume. Thus, we investigate how to split the space into bins and identify novel solutions that offer a good balance of desirable properties. As many data processing tools require a dataset as an input, we propose efficient methods how to obtain synthetic point sets that match the histograms over the overlapping bins

Warwick Research Archives Portal Repository

Lower bound techniques for data structures

Author: Pǎtraşcu Mihai
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 135-143).We describe new techniques for proving lower bounds on data-structure problems, with the following broad consequences: * the first [omega](lg n) lower bound for any dynamic problem, improving on a bound that had been standing since 1989; * for static data structures, the first separation between linear and polynomial space. Specifically, for some problems that have constant query time when polynomial space is allowed, we can show [omega](lg n/ lg lg n) bounds when the space is O(n - polylog n). Using these techniques, we analyze a variety of central data-structure problems, and obtain improved lower bounds for the following: * the partial-sums problem (a fundamental application of augmented binary search trees); * the predecessor problem (which is equivalent to IP lookup in Internet routers); * dynamic trees and dynamic connectivity; * orthogonal range stabbing. * orthogonal range counting, and orthogonal range reporting; * the partial match problem (searching with wild-cards); * (1 + [epsilon])-approximate near neighbor on the hypercube; * approximate nearest neighbor in the l[infinity] metric. Our new techniques lead to surprisingly non-technical proofs. For several problems, we obtain simpler proofs for bounds that were already known.by Mihai Pǎtraşcu.Ph.D

CiteSeerX

DSpace@MIT

38th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science: FSTTCS 2018, December 11-13, 2018, Ahmedabad, India

Author: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science 38. 2018 Ahmedabad
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/12/2018
Field of study

Digitale Bibliothek Thüringen

30th International Symposium on Theoretical Aspects of Computer Science: STACS '13, February 27th to March 2nd, 2013, Kiel, Germany

Author: STACS <30 2013, Kiel>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/02/2013
Field of study

Digitale Bibliothek Thüringen

LIPIcs, Volume 261, ICALP 2023, Complete Volume

Author: Etessami Kousha
Feige Uriel
Puppis Gabriele
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 261, ICALP 2023, Complete Volum

Dagstuhl Research Online Publication Server

Algebraic Methods in Language Processing:Proceedings of the twenty-first Twente workshop on language technology

Author
Publication venue: 'University Library/University of Twente'
Publication date: 15/08/2003
Field of study

University of Twente Research Information

29th International Symposium on Theoretical Aspects of Computer Science: STACS '12, February 29th to March 3rd, 2012, Paris, France

Author: Schloss Dagstuhl Leibniz-Zentrum für Informatik
Publication venue: Schloss Dagstuhl - Leibniz-Center for Informatics
Publication date: 01/02/2012
Field of study

Digitale Bibliothek Thüringen

Tree-Structured Problems and Parallel Computation

Author: Ludwig Michael
Publication venue: Universität Tübingen
Publication date: 01/01/2018
Field of study

Turing-Maschinen sind das klassische Beschreibungsmittel für Wortsprachen und werden daher auch benützt, um Komplexitätsklassen zu definieren. Dies geschieht zum Beispiel durch das Einschränken des Platz- oder Zeitaufwandes der Berechnung zur Lösung eines Problems. Für sehr niedrige Komplexität wie etwa sublineare Laufzeit, werden Schaltkreise verwendet. Schaltkreise können auf natürliche Art Komplexitäten wie etwa logarithmische Laufzeit modellieren. Ebenso können sie als eine Art paralleles Rechenmodell gesehen werden. Eine wichtige parallele Komplexitätsklasse ist NC1. Sie wird beschrieben durch Boolesche Schaltkreise logarithmischer Tiefe und beschränktem Eingangsgrad der Gatter. Eine initiale Beobachtung, die die vorliegende Arbeit motiviert, ist, dass viele schwere Probleme in NC1 eine ähnliche Struktur haben und auf ähnliche Art und Weise gelöst werden. Das Auswertungsproblem für Boolesche Formeln ist eines der repräsentativsten Probleme aus dieser Klasse: Gegeben ist hier eine aussagenlogische Formel samt Belegung für die Variablen; gefragt ist, ob sie zu wahr oder zu falsch auswertet. Dieses Problem wird in NC1 gelöst durch den Algorithmus von Buss. Auf ähnliche Art können arithmetische Formeln in #NC1 ausgewertet oder das Wortproblem für Visibly-Pushdown-Sprachen gelöst werden. Zu besagter Klasse an Problemen gehört auch Courcelles Theorem, welches Berechnungen in Baumautomaten involviert. Zu bemerken ist, dass alle angesprochenen Probleme gemeinsam haben, dass sie aus Instanzen bestehen, die baumartig sind. Formeln sind Bäume, Visibly-Pushdown-Sprachen enthalten als Wörter kodierte Bäume und Courcelles Theorem betrachtet Graphen mit beschränkter Baumweite, d.h. Graphen, die sich als Baum darstellen lassen. Insbesondere Letzteres ist ein Schema, das häufiger auftritt. Zum Beispiel gibt es NP-vollständige Graphprobleme wie das Finden von Hamilton-Kreisen, welches unter beschränkter Baumweite in P fällt. Neuere Analysen konnten diese Schranke weiter zu SAC1 verbessern, was eine parallele Komplexitätsklasse ist. Die angesprochenen Probleme kommen aus unterschiedlichen Bereichen und haben individuelle Lösungen. Hauptthese dieser Arbeit ist, dass sich diese Vielfalt vereinheitlichen lässt. Es wird ein generisches Lösungskonzept vorgestellt, welches darauf beruht, dass sich die Probleme auf ein Termevaluierungsproblem reduzieren lassen. Kernstück ist daher ein Termevaluierungsalgorithmus, der unabhängig von der Algebra, über welche der Term evaluiert werden soll, ist. Resultat ist, dass eine Vielzahl, darunter die oben angesprochenen Probleme, sich auf analoge Art lösen lassen, und dass sich ebenso leicht neue Resultate zeigen lassen. Diese Menge an Resultaten hätte sich ohne den vereinheitlichten Lösungsansatz nicht innerhalb des Rahmens einer Arbeit wie der vorliegenden zeigen lassen. Der entwickelte Lösungsansatz führt stets zu Schaltkreisfamilien polylogarithmischer Tiefe. Es wird jedoch auch die Frage behandelt, wie mächtig Schaltkreisfamilien konstanter Tiefe noch bezüglich Termevaluierung sind. Die Klasse AC0 ist hierfür ein natürlicher Kandidat; sie entspricht der Menge der Sprachen, die durch Logik erster Ordung beschreibbar sind. Um dieses Problem anzugehen, wird zunächst das Termevaluierungsproblem über endlichen Algebren betrachtet. Dieses wiederum lässt sich in das Wortproblem von Visibly-Pushdown-Sprachen einbetten. Daher handelt dieser Teil der Arbeit vornehmlich von der Beschreibbarkeit von Visibly-Pushdown-Sprachen in Logik erster Ordnung. Hierbei treten ungelöste Probleme zu Tage, welche ein Indiz dafür sind, wie schlecht die Komplexität konstanter Tiefe bisher noch verstanden ist, und das, trotz des Resultats von Furst, Saxe und Sipser, bzw. Håstads. Die bis jetzt beschrieben Inhalte sind Teil einer kontinuierlichen Entwicklung. Es gibt jedoch ein Thema in dieser Arbeit, das orthogonal dazu ist: Automaten und im speziellen Cost-Register-Automaten. Zum einen sind, wie oben angedeutet, Automaten Beispiele für Anwendungen des hier entwickelten generischen Lösungsansatzes. Zum anderen können sie selbst zur Beschreibung von Termevaluierungsproblemen dienen; so können Visibly-Pushdown-Automaten Termevaluierung über endlichen Algebren ausführen. Um über endliche Algebren hinauszugehen, benötigen die Automaten mehr Speicher. Visibly-Pushdown-Automaten haben einen Keller, der genau dafür geeignet ist, die Baumstruktur einer Eingabeformel zu verifizieren. Für nichtendliche Algebren eignet sich ein Modell, welches hier vorgestellt werden soll. Es kombiniert Visibly-Pushdown-Automaten mit Cost-Register-Automaten. Ein Cost-Register-Automat ist ein endlicher Automat, welcher mit zusätzlichen Registern ausgestattet ist. Die Register können Werte einer Algebra speichern und werden in jedem Schritt in Abhängigkeit des Eingabezeichens und des Zustandes aktualisiert. Dieser Einwegdatenfluss von Zuständen zu Registern sorgt dafür, dass dieses Modell nicht nur entscheidbar bleibt, sondern, in Abhängigkeit der Algebra, auch niedrige Komplexität hat. Das neue Modell der Cost-Register-Visibly-Pushdown-Automaten kann nun Terme evaluieren. Es werden grundlegende Eigenschaften gezeigt, einschließlich Komplexitätsaussagen

Publikationsserver der Universität Tübingen