19 research outputs found

    Optimal Hashing in External Memory

    Get PDF
    Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and Patrasu established an update/query tradeoff curve for external-hash tables: a hash table that performs insertions in O(lambda/B) amortized IOs requires Omega(log_lambda N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and lambda is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for lambda that is Omega(log log M + log_M N). In this paper, we present a simpler external-memory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of lambda. The simplicity of BOAs allows them to be readily modified to achieve the following results: - A new external-memory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. - The Cache-Oblivious Bundle of Trees Hash Table (COBOT), the first cache-oblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of lambda

    On the Succinct Representation of Equivalence Classes

    Get PDF
    Given a set of n elements that are partitioned into equivalence classes, we study the problem of assigning unique labels to these elements in order to support the query that asks whether the elements corresponding to two given labels belong to the same equivalence class. This problem has been studied by Katz et al., Alstrup et al., and Lewenstein et al.. Lewenstein et al. showed that with no auxiliary data structure, a label space of size nlg(n) is necessary and sufficient to represent the equivalence relation. They also showed that if the labels were to be assigned from the set [n], a data structure of square root of n bits is necessary and sufficient to represent the equivalence relation and to answer the equivalence query in O(lg(n)) time. In this thesis, we give an improved data structure that uses O(square root of n) bits and can answer queries in constant time, when the label space is of size n. Moreover, we study the case where we allow the label space to be of size cn for any constant c > 1. We show that with such a label space, a data structure of O(lg(n)) bits is necessary and sufficient to represent the equivalence relation and to answer the equivalence query in constant time. We believe that our work can trigger further work on tradeoffs between label space and auxiliary data structure space for other labeling problems

    Breaching the 2-Approximation Barrier for Connectivity Augmentation: a Reduction to Steiner Tree

    Full text link
    The basic goal of survivable network design is to build a cheap network that maintains the connectivity between given sets of nodes despite the failure of a few edges/nodes. The Connectivity Augmentation Problem (CAP) is arguably one of the most basic problems in this area: given a kk(-edge)-connected graph GG and a set of extra edges (links), select a minimum cardinality subset AA of links such that adding AA to GG increases its edge connectivity to k+1k+1. Intuitively, one wants to make an existing network more reliable by augmenting it with extra edges. The best known approximation factor for this NP-hard problem is 22, and this can be achieved with multiple approaches (the first such result is in [Frederickson and J\'aj\'a'81]). It is known [Dinitz et al.'76] that CAP can be reduced to the case k=1k=1, a.k.a. the Tree Augmentation Problem (TAP), for odd kk, and to the case k=2k=2, a.k.a. the Cactus Augmentation Problem (CacAP), for even kk. Several better than 22 approximation algorithms are known for TAP, culminating with a recent 1.4581.458 approximation [Grandoni et al.'18]. However, for CacAP the best known approximation is 22. In this paper we breach the 22 approximation barrier for CacAP, hence for CAP, by presenting a polynomial-time 2ln⁥(4)−9671120+Ï”<1.912\ln(4)-\frac{967}{1120}+\epsilon<1.91 approximation. Previous approaches exploit properties of TAP that do not seem to generalize to CacAP. We instead use a reduction to the Steiner tree problem which was previously used in parameterized algorithms [Basavaraju et al.'14]. This reduction is not approximation preserving, and using the current best approximation factor for Steiner tree [Byrka et al.'13] as a black-box would not be good enough to improve on 22. To achieve the latter goal, we ``open the box'' and exploit the specific properties of the instances of Steiner tree arising from CacAP.Comment: Corrected a typo in the abstract (in metadata

    Distance Estimation Between Unknown Matrices Using Sublinear Projections on Hamming Cube

    Get PDF
    Using geometric techniques like projection and dimensionality reduction, we show that there exists a randomized sub-linear time algorithm that can estimate the Hamming distance between two matrices. Consider two matrices A{\bf A} and B{\bf B} of size n×nn \times n whose dimensions are known to the algorithm but the entries are not. The entries of the matrix are real numbers. The access to any matrix is through an oracle that computes the projection of a row (or a column) of the matrix on a vector in {0,1}n\{0,1\}^n. We call this query oracle to be an {\sc Inner Product} oracle (shortened as {\sc IP}). We show that our algorithm returns a (1±ϔ)(1\pm \epsilon) approximation to DM(A,B){{\bf D}}_{\bf M} ({\bf A},{\bf B}) with high probability by making {\cal O}\left(\frac{n}{\sqrt{{{\bf D}}_{\bf M} ({\bf A},{\bf B})}}\mbox{poly}\left(\log n, \frac{1}{\epsilon}\right)\right) oracle queries, where DM(A,B){{\bf D}}_{\bf M} ({\bf A},{\bf B}) denotes the Hamming distance (the number of corresponding entries in which A{\bf A} and B{\bf B} differ) between two matrices A{\bf A} and B{\bf B} of size n×nn \times n. We also show a matching lower bound on the number of such {\sc IP} queries needed. Though our main result is on estimating DM(A,B){{\bf D}}_{\bf M} ({\bf A},{\bf B}) using {\sc IP}, we also compare our results with other query models.Comment: 30 pages. Accepted in RANDOM'2

    Learning Reserve Prices in Second-Price Auctions

    Get PDF
    This paper proves the tight sample complexity of Second-Price Auction with Anonymous Reserve, up to a logarithmic factor, for each of all the value distribution families studied in the literature: [0,1]-bounded, [1,H]-bounded, regular, and monotone hazard rate (MHR). Remarkably, the setting-specific tight sample complexity poly(?^{-1}) depends on the precision ? ? (0, 1), but not on the number of bidders n ? 1. Further, in the two bounded-support settings, our learning algorithm allows correlated value distributions. In contrast, the tight sample complexity ??(n) ? poly(?^{-1}) of Myerson Auction proved by Guo, Huang and Zhang (STOC 2019) has a nearly-linear dependence on n ? 1, and holds only for independent value distributions in every setting. We follow a similar framework as the Guo-Huang-Zhang work, but replace their information theoretical arguments with a direct proof

    Fault Tolerant Max-Cut

    Get PDF
    In this work, we initiate the study of fault tolerant Max-Cut, where given an edge-weighted undirected graph G = (V,E), the goal is to find a cut S ? V that maximizes the total weight of edges that cross S even after an adversary removes k vertices from G. We consider two types of adversaries: an adaptive adversary that sees the outcome of the random coin tosses used by the algorithm, and an oblivious adversary that does not. For any constant number of failures k we present an approximation of (0.878-?) against an adaptive adversary and of ?_{GW}? 0.8786 against an oblivious adversary (here ?_{GW} is the approximation achieved by the random hyperplane algorithm of [Goemans-Williamson J. ACM `95]). Additionally, we present a hardness of approximation of ?_{GW} against both types of adversaries, rendering our results (virtually) tight. The non-linear nature of the fault tolerant objective makes the design and analysis of algorithms harder when compared to the classic Max-Cut. Hence, we employ approaches ranging from multi-objective optimization to LP duality and the ellipsoid algorithm to obtain our results

    Succinct Data Structures for Chordal Graphs

    Get PDF
    We study the problem of approximate shortest path queries in chordal graphs and give a n log n + o(n log n) bit data structure to answer the approximate distance query to within an additive constant of 1 in O(1) time. We study the problem of succinctly storing a static chordal graph to answer adjacency, degree, neighbourhood and shortest path queries. Let G be a chordal graph with n vertices. We design a data structure using the information theoretic minimal n^2/4 + o(n^2) bits of space to support the queries: whether two vertices u,v are adjacent in time f(n) for any f(n) \in \omega(1). the degree of a vertex in O(1) time. the vertices adjacent to u in O(f(n)^2) time per neighbour the length of the shortest path from u to v in O(n f(n)) tim

    Learning Reserve Prices in Second-Price Auctions

    Get PDF
    This paper proves the tight sample complexity of Second-Price Auction with Anonymous Reserve, up to a logarithmic factor, for all value distribution families that have been considered in the literature. Compared to Myerson Auction, whose sample complexity was settled very recently in (Guo, Huang and Zhang, STOC 2019), Anonymous Reserve requires much fewer samples for learning. We follow a similar framework as the Guo-Huang-Zhang work, but replace their information theoretical argument with a direct proof

    Diameter computation on H-minor free graphs and graphs of bounded (distance) VC-dimension

    Get PDF
    International audienceUnder the Strong Exponential-Time Hypothesis, the diameter of general unweighted graphs cannot be computed in truly subquadratic time. Nevertheless there are several graph classes for which this can be done such as bounded-treewidth graphs, interval graphs and planar graphs, to name a few. We propose to study unweighted graphs of constant distance VC-dimension as a broad generalization of many such classes-where the distance VC-dimension of a graph G is defined as the VC-dimension of its ball hypergraph: whose hyperedges are the balls of all possible radii and centers in G. In particular for any fixed H, the class of H-minor free graphs has distance VC-dimension at most |V (H)| − 1. ‱ Our first main result is a Monte Carlo algorithm that on graphs of distance VC-dimension at most d, for any fixed k, either computes the diameter or concludes that it is larger than k in time Õ(k · mn 1−Δ_d), where Δ_d ∈ (0; 1) only depends on d. We thus obtain a truly subquadratic-time parameterized algorithm for computing the diameter on such graphs. ‱ Then as a byproduct of our approach, we get the first truly subquadratic-time randomized algorithm for constant diameter computation on all the nowhere dense graph classes. The latter classes include all proper minor-closed graph classes, bounded-degree graphs and graphs of bounded expansion. ‱ Finally, we show how to remove the dependency on k for any graph class that excludes a fixed graph H as a minor. More generally, our techniques apply to any graph with constant distance VC-dimension and polynomial expansion (or equivalently having strongly sublin-ear balanced separators). As a result for all such graphs one obtains a truly subquadratic-time randomized algorithm for computing their diameter. We note that all our results also hold for radius computation. Our approach is based on the work of Chazelle and Welzl who proved the existence of spanning paths with strongly sublinear stabbing number for every hypergraph of constant VC-dimension. We show how to compute such paths efficiently by combining known algorithms for the stabbing number problem with a clever use of Δ-nets, region decomposition and other partition techniques
    corecore