184 research outputs found
Approximate Range Emptiness in Constant Time and Optimal Space
This paper studies the \emph{-approximate range emptiness} problem, where the task is to represent a set of points from and answer emptiness queries of the form " ?" with a probability of \emph{false positives} allowed. This generalizes the functionality of \emph{Bloom filters} from single point queries to any interval length . Setting the false positive rate to and performing queries, Bloom filters yield a solution to this problem with space bits, false positive probability bounded by for intervals of length up to , using query time . Our first contribution is to show that the space/error trade-off cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to with false positive probability , must use space bits. On the positive side we show that the query time can be improved greatly, to constant time, while matching our space lower bound up to a lower order additive term. This result is achieved through a succinct data structure for (non-approximate 1d) range emptiness/reporting queries, which may be of independent interest
Triangle Counting in Dynamic Graph Streams
Estimating the number of triangles in graph streams using a limited amount of
memory has become a popular topic in the last decade. Different variations of
the problem have been studied, depending on whether the graph edges are
provided in an arbitrary order or as incidence lists. However, with a few
exceptions, the algorithms have considered {\em insert-only} streams. We
present a new algorithm estimating the number of triangles in {\em dynamic}
graph streams where edges can be both inserted and deleted. We show that our
algorithm achieves better time and space complexity than previous solutions for
various graph classes, for example sparse graphs with a relatively small number
of triangles. Also, for graphs with constant transitivity coefficient, a common
situation in real graphs, this is the first algorithm achieving constant
processing time per edge. The result is achieved by a novel approach combining
sampling of vertex triples and sparsification of the input graph. In the course
of the analysis of the algorithm we present a lower bound on the number of
pairwise independent 2-paths in general graphs which might be of independent
interest. At the end of the paper we discuss lower bounds on the space
complexity of triangle counting algorithms that make no assumptions on the
structure of the graph.Comment: New version of a SWAT 2014 paper with improved result
Efficient Dynamic Approximate Distance Oracles for Vertex-Labeled Planar Graphs
Let be a graph where each vertex is associated with a label. A
Vertex-Labeled Approximate Distance Oracle is a data structure that, given a
vertex and a label , returns a -approximation of
the distance from to the closest vertex with label in . Such
an oracle is dynamic if it also supports label changes. In this paper we
present three different dynamic approximate vertex-labeled distance oracles for
planar graphs, all with polylogarithmic query and update times, and nearly
linear space requirements
On Counting Triangles through Edge Sampling in Large Dynamic Graphs
Traditional frameworks for dynamic graphs have relied on processing only the
stream of edges added into or deleted from an evolving graph, but not any
additional related information such as the degrees or neighbor lists of nodes
incident to the edges. In this paper, we propose a new edge sampling framework
for big-graph analytics in dynamic graphs which enhances the traditional model
by enabling the use of additional related information. To demonstrate the
advantages of this framework, we present a new sampling algorithm, called Edge
Sample and Discard (ESD). It generates an unbiased estimate of the total number
of triangles, which can be continuously updated in response to both edge
additions and deletions. We provide a comparative analysis of the performance
of ESD against two current state-of-the-art algorithms in terms of accuracy and
complexity. The results of the experiments performed on real graphs show that,
with the help of the neighborhood information of the sampled edges, the
accuracy achieved by our algorithm is substantially better. We also
characterize the impact of properties of the graph on the performance of our
algorithm by testing on several Barabasi-Albert graphs.Comment: A short version of this article appeared in Proceedings of the 2017
IEEE/ACM International Conference on Advances in Social Networks Analysis and
Mining (ASONAM 2017
Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket
We study wear-leveling techniques for cuckoo hashing, showing that it is
possible to achieve a memory wear bound of after the
insertion of items into a table of size for a suitable constant
using cuckoo hashing. Moreover, we study our cuckoo hashing method empirically,
showing that it significantly improves on the memory wear performance for
classic cuckoo hashing and linear probing in practice.Comment: 13 pages, 1 table, 7 figures; to appear at the 13th Symposium on
Experimental Algorithms (SEA 2014
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Self-driving cars need to understand 3D scenes efficiently and accurately in
order to drive safely. Given the limited hardware resources, existing 3D
perception models are not able to recognize small instances (e.g., pedestrians,
cyclists) very well due to the low-resolution voxelization and aggressive
downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv),
a lightweight 3D module that equips the vanilla Sparse Convolution with the
high-resolution point-based branch. With negligible overhead, this point-based
branch is able to preserve the fine details even from large outdoor scenes. To
explore the spectrum of efficient 3D models, we first define a flexible
architecture design space based on SPVConv, and we then present 3D Neural
Architecture Search (3D-NAS) to search the optimal network architecture over
this diverse design space efficiently and effectively. Experimental results
validate that the resulting SPVNAS model is fast and accurate: it outperforms
the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive
SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x
measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer
our method to 3D object detection, and it achieves consistent improvements over
the one-stage detection baseline on KITTI.Comment: ECCV 2020. The first two authors contributed equally to this work.
Project page: http://spvnas.mit.edu
Dynamic Compressed Strings with Random Access
We consider the problem of storing a string S in dynamic compressed form, while permitting operations directly on the compressed representation of S: access a substring of S; replace, insert or delete a symbol in S; count how many occurrences of a given symbol appear in any given prefix of S (called rank operation) and locate the position of the ith occurrence of a symbol inside S (called select operation). We discuss the time complexity of several combinations of these operations along with the entropy space bounds of the corresponding compressed indexes. In this way, we extend or improve the bounds of previous work by Ferragina and Venturini [TCS, 2007], Jansson et al. [ICALP, 2012], and Nekrich and Navarro [SODA, 2013]
Efficiently Correcting Matrix Products
We study the problem of efficiently correcting an erroneous product of two
matrices over a ring. Among other things, we provide a randomized
algorithm for correcting a matrix product with at most erroneous entries
running in time and a deterministic -time
algorithm for this problem (where the notation suppresses
polylogarithmic terms in and ).Comment: Fixed invalid reference to figure in v
Sub-logarithmic Distributed Oblivious RAM with Small Block Size
Oblivious RAM (ORAM) is a cryptographic primitive that allows a client to
securely execute RAM programs over data that is stored in an untrusted server.
Distributed Oblivious RAM is a variant of ORAM, where the data is stored in
servers. Extensive research over the last few decades have succeeded to
reduce the bandwidth overhead of ORAM schemes, both in the single-server and
the multi-server setting, from to . However, all known
protocols that achieve a sub-logarithmic overhead either require heavy
server-side computation (e.g. homomorphic encryption), or a large block size of
at least .
In this paper, we present a family of distributed ORAM constructions that
follow the hierarchical approach of Goldreich and Ostrovsky [GO96]. We enhance
known techniques, and develop new ones, to take better advantage of the
existence of multiple servers. By plugging efficient known hashing schemes in
our constructions, we get the following results:
1. For any , we show an -server ORAM scheme with overhead, and block size . This scheme is
private even against an -server collusion. 2. A 3-server ORAM
construction with overhead and a block size
almost logarithmic, i.e. .
We also investigate a model where the servers are allowed to perform a linear
amount of light local computations, and show that constant overhead is
achievable in this model, through a simple four-server ORAM protocol
Interactive Learning for Multimedia at Large
International audienceInteractive learning has been suggested as a key method for addressing analytic multimedia tasks arising in several domains. Until recently, however, methods to maintain interactive performance at the scale of today's media collections have not been addressed. We propose an interactive learning approach that builds on and extends the state of the art in user relevance feedback systems and high-dimensional indexing for multimedia. We report on a detailed experimental study using the ImageNet and YFCC100M collections, containing 14 million and 100 million images respectively. The proposed approach outperforms the relevant state-of-the-art approaches in terms of interactive performance, while improving suggestion relevance in some cases. In particular, even on YFCC100M, our approach requires less than 0.3 s per interaction round to generate suggestions, using a single computing core and less than 7 GB of main memory
- …