19 research outputs found
Optimal Hashing in External Memory
Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions.
Iacono and Patrasu established an update/query tradeoff curve for external-hash tables: a hash table that performs insertions in O(lambda/B) amortized IOs requires Omega(log_lambda N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and lambda is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for lambda that is Omega(log log M + log_M N).
In this paper, we present a simpler external-memory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of lambda. The simplicity of BOAs allows them to be readily modified to achieve the following results:
- A new external-memory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs.
- The Cache-Oblivious Bundle of Trees Hash Table (COBOT), the first cache-oblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of lambda
On the Succinct Representation of Equivalence Classes
Given a set of n elements that are partitioned into equivalence classes, we study the problem of assigning unique labels to these elements in order to support the query that asks whether the elements corresponding to two given labels belong to the same equivalence class. This problem has been studied by Katz et al., Alstrup et al., and Lewenstein et al.. Lewenstein et al. showed that with no auxiliary data structure, a label space of size nlg(n) is necessary and sufficient to represent the equivalence relation. They also showed that if the labels were to be assigned from the set [n], a data structure of square root of n bits is necessary and sufficient to represent the equivalence relation and to answer the equivalence query in O(lg(n)) time. In this thesis, we give an improved data structure that uses O(square root of n) bits and can answer queries in constant time, when the label space is of size n. Moreover, we study the case where we allow the label space to be of size cn for any constant c > 1. We show that with such a label space, a data structure of O(lg(n)) bits is necessary and sufficient to represent the equivalence relation and to answer the equivalence query in constant time. We believe that our work can trigger further work on tradeoffs between label space and auxiliary data structure space for other labeling problems
Breaching the 2-Approximation Barrier for Connectivity Augmentation: a Reduction to Steiner Tree
The basic goal of survivable network design is to build a cheap network that
maintains the connectivity between given sets of nodes despite the failure of a
few edges/nodes. The Connectivity Augmentation Problem (CAP) is arguably one of
the most basic problems in this area: given a (-edge)-connected graph
and a set of extra edges (links), select a minimum cardinality subset of
links such that adding to increases its edge connectivity to .
Intuitively, one wants to make an existing network more reliable by augmenting
it with extra edges. The best known approximation factor for this NP-hard
problem is , and this can be achieved with multiple approaches (the first
such result is in [Frederickson and J\'aj\'a'81]).
It is known [Dinitz et al.'76] that CAP can be reduced to the case ,
a.k.a. the Tree Augmentation Problem (TAP), for odd , and to the case ,
a.k.a. the Cactus Augmentation Problem (CacAP), for even . Several better
than approximation algorithms are known for TAP, culminating with a recent
approximation [Grandoni et al.'18]. However, for CacAP the best known
approximation is .
In this paper we breach the approximation barrier for CacAP, hence for
CAP, by presenting a polynomial-time
approximation. Previous approaches exploit properties of TAP that do not seem
to generalize to CacAP. We instead use a reduction to the Steiner tree problem
which was previously used in parameterized algorithms [Basavaraju et al.'14].
This reduction is not approximation preserving, and using the current best
approximation factor for Steiner tree [Byrka et al.'13] as a black-box would
not be good enough to improve on . To achieve the latter goal, we ``open the
box'' and exploit the specific properties of the instances of Steiner tree
arising from CacAP.Comment: Corrected a typo in the abstract (in metadata
Distance Estimation Between Unknown Matrices Using Sublinear Projections on Hamming Cube
Using geometric techniques like projection and dimensionality reduction, we
show that there exists a randomized sub-linear time algorithm that can estimate
the Hamming distance between two matrices. Consider two matrices and
of size whose dimensions are known to the algorithm but
the entries are not. The entries of the matrix are real numbers. The access to
any matrix is through an oracle that computes the projection of a row (or a
column) of the matrix on a vector in . We call this query oracle to
be an {\sc Inner Product} oracle (shortened as {\sc IP}). We show that our
algorithm returns a approximation to with high probability by making {\cal
O}\left(\frac{n}{\sqrt{{{\bf D}}_{\bf M} ({\bf A},{\bf
B})}}\mbox{poly}\left(\log n, \frac{1}{\epsilon}\right)\right) oracle queries,
where denotes the Hamming distance (the
number of corresponding entries in which and differ)
between two matrices and of size . We also show
a matching lower bound on the number of such {\sc IP} queries needed. Though
our main result is on estimating using
{\sc IP}, we also compare our results with other query models.Comment: 30 pages. Accepted in RANDOM'2
Learning Reserve Prices in Second-Price Auctions
This paper proves the tight sample complexity of Second-Price Auction with Anonymous Reserve, up to a logarithmic factor, for each of all the value distribution families studied in the literature: [0,1]-bounded, [1,H]-bounded, regular, and monotone hazard rate (MHR). Remarkably, the setting-specific tight sample complexity poly(?^{-1}) depends on the precision ? ? (0, 1), but not on the number of bidders n ? 1. Further, in the two bounded-support settings, our learning algorithm allows correlated value distributions.
In contrast, the tight sample complexity ??(n) ? poly(?^{-1}) of Myerson Auction proved by Guo, Huang and Zhang (STOC 2019) has a nearly-linear dependence on n ? 1, and holds only for independent value distributions in every setting.
We follow a similar framework as the Guo-Huang-Zhang work, but replace their information theoretical arguments with a direct proof
Fault Tolerant Max-Cut
In this work, we initiate the study of fault tolerant Max-Cut, where given an edge-weighted undirected graph G = (V,E), the goal is to find a cut S ? V that maximizes the total weight of edges that cross S even after an adversary removes k vertices from G. We consider two types of adversaries: an adaptive adversary that sees the outcome of the random coin tosses used by the algorithm, and an oblivious adversary that does not. For any constant number of failures k we present an approximation of (0.878-?) against an adaptive adversary and of ?_{GW}? 0.8786 against an oblivious adversary (here ?_{GW} is the approximation achieved by the random hyperplane algorithm of [Goemans-Williamson J. ACM `95]). Additionally, we present a hardness of approximation of ?_{GW} against both types of adversaries, rendering our results (virtually) tight.
The non-linear nature of the fault tolerant objective makes the design and analysis of algorithms harder when compared to the classic Max-Cut. Hence, we employ approaches ranging from multi-objective optimization to LP duality and the ellipsoid algorithm to obtain our results
Succinct Data Structures for Chordal Graphs
We study the problem of approximate shortest path queries in chordal graphs and give a n log n + o(n log n) bit data structure to answer the approximate distance query to within an additive constant of 1 in O(1) time.
We study the problem of succinctly storing a static chordal graph to answer adjacency, degree, neighbourhood and shortest path queries. Let G be a chordal graph with n vertices. We design a data structure using the information theoretic minimal n^2/4 + o(n^2) bits of space to support the queries:
whether two vertices u,v are adjacent in time f(n) for any f(n) \in \omega(1).
the degree of a vertex in O(1) time.
the vertices adjacent to u in O(f(n)^2) time per neighbour
the length of the shortest path from u to v in O(n f(n)) tim
Learning Reserve Prices in Second-Price Auctions
This paper proves the tight sample complexity of Second-Price Auction with
Anonymous Reserve, up to a logarithmic factor, for all value distribution
families that have been considered in the literature. Compared to Myerson
Auction, whose sample complexity was settled very recently in (Guo, Huang and
Zhang, STOC 2019), Anonymous Reserve requires much fewer samples for learning.
We follow a similar framework as the Guo-Huang-Zhang work, but replace their
information theoretical argument with a direct proof
Diameter computation on H-minor free graphs and graphs of bounded (distance) VC-dimension
International audienceUnder the Strong Exponential-Time Hypothesis, the diameter of general unweighted graphs cannot be computed in truly subquadratic time. Nevertheless there are several graph classes for which this can be done such as bounded-treewidth graphs, interval graphs and planar graphs, to name a few. We propose to study unweighted graphs of constant distance VC-dimension as a broad generalization of many such classes-where the distance VC-dimension of a graph G is defined as the VC-dimension of its ball hypergraph: whose hyperedges are the balls of all possible radii and centers in G. In particular for any fixed H, the class of H-minor free graphs has distance VC-dimension at most |V (H)| â 1. âą Our first main result is a Monte Carlo algorithm that on graphs of distance VC-dimension at most d, for any fixed k, either computes the diameter or concludes that it is larger than k in time Ă(k · mn 1âΔ_d), where Δ_d â (0; 1) only depends on d. We thus obtain a truly subquadratic-time parameterized algorithm for computing the diameter on such graphs. âą Then as a byproduct of our approach, we get the first truly subquadratic-time randomized algorithm for constant diameter computation on all the nowhere dense graph classes. The latter classes include all proper minor-closed graph classes, bounded-degree graphs and graphs of bounded expansion. âą Finally, we show how to remove the dependency on k for any graph class that excludes a fixed graph H as a minor. More generally, our techniques apply to any graph with constant distance VC-dimension and polynomial expansion (or equivalently having strongly sublin-ear balanced separators). As a result for all such graphs one obtains a truly subquadratic-time randomized algorithm for computing their diameter. We note that all our results also hold for radius computation. Our approach is based on the work of Chazelle and Welzl who proved the existence of spanning paths with strongly sublinear stabbing number for every hypergraph of constant VC-dimension. We show how to compute such paths efficiently by combining known algorithms for the stabbing number problem with a clever use of Δ-nets, region decomposition and other partition techniques