153 research outputs found
Approximate Range Emptiness in Constant Time and Optimal Space
This paper studies the \emph{-approximate range emptiness} problem, where the task is to represent a set of points from and answer emptiness queries of the form " ?" with a probability of \emph{false positives} allowed. This generalizes the functionality of \emph{Bloom filters} from single point queries to any interval length . Setting the false positive rate to and performing queries, Bloom filters yield a solution to this problem with space bits, false positive probability bounded by for intervals of length up to , using query time . Our first contribution is to show that the space/error trade-off cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to with false positive probability , must use space bits. On the positive side we show that the query time can be improved greatly, to constant time, while matching our space lower bound up to a lower order additive term. This result is achieved through a succinct data structure for (non-approximate 1d) range emptiness/reporting queries, which may be of independent interest
Efficiently Correcting Matrix Products
We study the problem of efficiently correcting an erroneous product of two
matrices over a ring. Among other things, we provide a randomized
algorithm for correcting a matrix product with at most erroneous entries
running in time and a deterministic -time
algorithm for this problem (where the notation suppresses
polylogarithmic terms in and ).Comment: Fixed invalid reference to figure in v
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Self-driving cars need to understand 3D scenes efficiently and accurately in
order to drive safely. Given the limited hardware resources, existing 3D
perception models are not able to recognize small instances (e.g., pedestrians,
cyclists) very well due to the low-resolution voxelization and aggressive
downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv),
a lightweight 3D module that equips the vanilla Sparse Convolution with the
high-resolution point-based branch. With negligible overhead, this point-based
branch is able to preserve the fine details even from large outdoor scenes. To
explore the spectrum of efficient 3D models, we first define a flexible
architecture design space based on SPVConv, and we then present 3D Neural
Architecture Search (3D-NAS) to search the optimal network architecture over
this diverse design space efficiently and effectively. Experimental results
validate that the resulting SPVNAS model is fast and accurate: it outperforms
the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive
SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x
measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer
our method to 3D object detection, and it achieves consistent improvements over
the one-stage detection baseline on KITTI.Comment: ECCV 2020. The first two authors contributed equally to this work.
Project page: http://spvnas.mit.edu
Interactive Learning for Multimedia at Large
International audienceInteractive learning has been suggested as a key method for addressing analytic multimedia tasks arising in several domains. Until recently, however, methods to maintain interactive performance at the scale of today's media collections have not been addressed. We propose an interactive learning approach that builds on and extends the state of the art in user relevance feedback systems and high-dimensional indexing for multimedia. We report on a detailed experimental study using the ImageNet and YFCC100M collections, containing 14 million and 100 million images respectively. The proposed approach outperforms the relevant state-of-the-art approaches in terms of interactive performance, while improving suggestion relevance in some cases. In particular, even on YFCC100M, our approach requires less than 0.3 s per interaction round to generate suggestions, using a single computing core and less than 7 GB of main memory
Sub-logarithmic Distributed Oblivious RAM with Small Block Size
Oblivious RAM (ORAM) is a cryptographic primitive that allows a client to
securely execute RAM programs over data that is stored in an untrusted server.
Distributed Oblivious RAM is a variant of ORAM, where the data is stored in
servers. Extensive research over the last few decades have succeeded to
reduce the bandwidth overhead of ORAM schemes, both in the single-server and
the multi-server setting, from to . However, all known
protocols that achieve a sub-logarithmic overhead either require heavy
server-side computation (e.g. homomorphic encryption), or a large block size of
at least .
In this paper, we present a family of distributed ORAM constructions that
follow the hierarchical approach of Goldreich and Ostrovsky [GO96]. We enhance
known techniques, and develop new ones, to take better advantage of the
existence of multiple servers. By plugging efficient known hashing schemes in
our constructions, we get the following results:
1. For any , we show an -server ORAM scheme with overhead, and block size . This scheme is
private even against an -server collusion. 2. A 3-server ORAM
construction with overhead and a block size
almost logarithmic, i.e. .
We also investigate a model where the servers are allowed to perform a linear
amount of light local computations, and show that constant overhead is
achievable in this model, through a simple four-server ORAM protocol
Efficient counting of k-mers in DNA sequences using a bloom filter
<p>Abstract</p> <p>Background</p> <p>Counting <it>k</it>-mers (substrings of length <it>k </it>in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting <it>k</it>-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers. In current data sets, a large fraction-often more than 50%-of the storage capacity may be spent on storing <it>k</it>-mers that contain sequencing errors and which are typically observed only a single time in the data. These singleton <it>k</it>-mers are uninformative for many algorithms without some kind of error correction.</p> <p>Results</p> <p>We present a new method that identifies all the <it>k</it>-mers that occur more than once in a DNA sequence data set. Our method does this using a Bloom filter, a probabilistic data structure that stores all the observed <it>k</it>-mers implicitly in memory with greatly reduced memory requirements. We then make a second sweep through the data to provide exact counts of all nonunique <it>k</it>-mers. For example data sets, we report up to 50% savings in memory usage compared to current software, with modest costs in computational speed. This approach may reduce memory requirements for any algorithm that starts by counting <it>k</it>-mers in sequence data with errors.</p> <p>Conclusions</p> <p>A reference implementation for this methodology, BFCounter, is written in C++ and is GPL licensed. It is available for free download at <url>http://pritch.bsd.uchicago.edu/bfcounter.html</url></p
Extracellular vesicles and intercellular communication within the nervous system
Extracellular vesicles (EVs, including exosomes) are implicated in many aspects of nervous system development and function, including regulation of synaptic communication, synaptic strength, and nerve regeneration. They mediate the transfer of packets of information in the form of nonsecreted proteins and DNA/RNA protected within a membrane compartment. EVs are essential for the packaging and transport of many cell-fate proteins during development as well as many neurotoxic misfolded proteins during pathogenesis. This form of communication provides another dimension of cellular crosstalk, with the ability to assemble a âkitâ of directional instructions made up of different molecular entities and address it to specific recipient cells. This multidimensional form of communication has special significance in the nervous system. How EVs help to orchestrate the wiring of the brain while allowing for plasticity associated with learning and memory and contribute to regeneration and degeneration are all under investigation. Because they carry specific disease-related RNAs and proteins, practical applications of EVs include potential uses as biomarkers and therapeutics. This Review describes our current understanding of EVs and serves as a springboard for future advances, which may reveal new important mechanisms by which EVs in coordinate brain and body function and dysfunction
- âŠ