189 research outputs found

    Approximate Range Emptiness in Constant Time and Optimal Space

    No full text
    This paper studies the \emph{ε\varepsilon-approximate range emptiness} problem, where the task is to represent a set SS of nn points from {0,,U1}\{0,\ldots,U-1\} and answer emptiness queries of the form "[a;b]S[a ; b]\cap S \neq \emptyset ?" with a probability of \emph{false positives} allowed. This generalizes the functionality of \emph{Bloom filters} from single point queries to any interval length LL. Setting the false positive rate to ε/L\varepsilon/L and performing LL queries, Bloom filters yield a solution to this problem with space O(nlg(L/ε))O(n \lg(L/\varepsilon)) bits, false positive probability bounded by ε\varepsilon for intervals of length up to LL, using query time O(Llg(L/ε))O(L \lg(L/\varepsilon)). Our first contribution is to show that the space/error trade-off cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to LL with false positive probability ε\varepsilon, must use space Ω(nlg(L/ε))O(n)\Omega(n \lg(L/\varepsilon)) - O(n) bits. On the positive side we show that the query time can be improved greatly, to constant time, while matching our space lower bound up to a lower order additive term. This result is achieved through a succinct data structure for (non-approximate 1d) range emptiness/reporting queries, which may be of independent interest

    Triangle Counting in Dynamic Graph Streams

    Get PDF
    Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary order or as incidence lists. However, with a few exceptions, the algorithms have considered {\em insert-only} streams. We present a new algorithm estimating the number of triangles in {\em dynamic} graph streams where edges can be both inserted and deleted. We show that our algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively small number of triangles. Also, for graphs with constant transitivity coefficient, a common situation in real graphs, this is the first algorithm achieving constant processing time per edge. The result is achieved by a novel approach combining sampling of vertex triples and sparsification of the input graph. In the course of the analysis of the algorithm we present a lower bound on the number of pairwise independent 2-paths in general graphs which might be of independent interest. At the end of the paper we discuss lower bounds on the space complexity of triangle counting algorithms that make no assumptions on the structure of the graph.Comment: New version of a SWAT 2014 paper with improved result

    On the k-Independence Required by Linear Probing and Minwise Independence

    Full text link

    Efficiently Correcting Matrix Products

    Get PDF
    We study the problem of efficiently correcting an erroneous product of two n×nn\times n matrices over a ring. Among other things, we provide a randomized algorithm for correcting a matrix product with at most kk erroneous entries running in O~(n2+kn)\tilde{O}(n^2+kn) time and a deterministic O~(kn2)\tilde{O}(kn^2)-time algorithm for this problem (where the notation O~\tilde{O} suppresses polylogarithmic terms in nn and kk).Comment: Fixed invalid reference to figure in v

    Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket

    Full text link
    We study wear-leveling techniques for cuckoo hashing, showing that it is possible to achieve a memory wear bound of loglogn+O(1)\log\log n+O(1) after the insertion of nn items into a table of size CnCn for a suitable constant CC using cuckoo hashing. Moreover, we study our cuckoo hashing method empirically, showing that it significantly improves on the memory wear performance for classic cuckoo hashing and linear probing in practice.Comment: 13 pages, 1 table, 7 figures; to appear at the 13th Symposium on Experimental Algorithms (SEA 2014

    Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

    Full text link
    Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.Comment: ECCV 2020. The first two authors contributed equally to this work. Project page: http://spvnas.mit.edu

    Dynamic Compressed Strings with Random Access

    Full text link
    We consider the problem of storing a string S in dynamic compressed form, while permitting operations directly on the compressed representation of S: access a substring of S; replace, insert or delete a symbol in S; count how many occurrences of a given symbol appear in any given prefix of S (called rank operation) and locate the position of the ith occurrence of a symbol inside S (called select operation). We discuss the time complexity of several combinations of these operations along with the entropy space bounds of the corresponding compressed indexes. In this way, we extend or improve the bounds of previous work by Ferragina and Venturini [TCS, 2007], Jansson et al. [ICALP, 2012], and Nekrich and Navarro [SODA, 2013]

    Interactive Learning for Multimedia at Large

    Get PDF
    International audienceInteractive learning has been suggested as a key method for addressing analytic multimedia tasks arising in several domains. Until recently, however, methods to maintain interactive performance at the scale of today's media collections have not been addressed. We propose an interactive learning approach that builds on and extends the state of the art in user relevance feedback systems and high-dimensional indexing for multimedia. We report on a detailed experimental study using the ImageNet and YFCC100M collections, containing 14 million and 100 million images respectively. The proposed approach outperforms the relevant state-of-the-art approaches in terms of interactive performance, while improving suggestion relevance in some cases. In particular, even on YFCC100M, our approach requires less than 0.3 s per interaction round to generate suggestions, using a single computing core and less than 7 GB of main memory

    PSI from PaXoS: Fast, Malicious Private Set Intersection

    Get PDF
    We present a 2-party private set intersection (PSI) protocol which provides security against malicious participants, yet is almost as fast as the fastest known semi-honest PSI protocol of Kolesnikov et al. (CCS 2016). Our protocol is based on a new approach for two-party PSI, which can be instantiated to provide security against either malicious or semi-honest adversaries. The protocol is unique in that the only difference between the semi-honest and malicious versions is an instantiation with different parameters for a linear error-correction code. It is also the first PSI protocol which is concretely efficient while having linear communication and security against malicious adversaries, while running in the OT-hybrid model (assuming a non-programmable random oracle). State of the art semi-honest PSI protocols take advantage of cuckoo hashing, but it has proven a challenge to use cuckoo hashing for malicious security. Our protocol is the first to use cuckoo hashing for malicious-secure PSI. We do so via a new data structure, called a probe-and-XOR of strings (PaXoS), which may be of independent interest. This abstraction captures important properties of previous data structures, most notably garbled Bloom filters. While an encoding by a garbled Bloom filter is larger by a factor of O(λ)O(\lambda) than the original data, we describe a significantly improved PaXoS based on cuckoo hashing that achieves constant rate while being no worse in other relevant efficiency measures
    corecore