6,079 research outputs found
Retouched Bloom Filters: Allowing Networked Applications to Flexibly Trade Off False Positives Against False Negatives
Where distributed agents must share voluminous set membership information,
Bloom filters provide a compact, though lossy, way for them to do so. Numerous
recent networking papers have examined the trade-offs between the bandwidth
consumed by the transmission of Bloom filters, and the error rate, which takes
the form of false positives, and which rises the more the filters are
compressed. In this paper, we introduce the retouched Bloom filter (RBF), an
extension that makes the Bloom filter more flexible by permitting the removal
of selected false positives at the expense of generating random false
negatives. We analytically show that RBFs created through a random process
maintain an overall error rate, expressed as a combination of the false
positive rate and the false negative rate, that is equal to the false positive
rate of the corresponding Bloom filters. We further provide some simple
heuristics and improved algorithms that decrease the false positive rate more
than than the corresponding increase in the false negative rate, when creating
RBFs. Finally, we demonstrate the advantages of an RBF over a Bloom filter in a
distributed network topology measurement application, where information about
large stop sets must be shared among route tracing monitors.Comment: This is a new version of the technical reports with improved
algorithms and theorical analysis of algorithm
An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters
Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called
lossy compression). This work presents a yes-no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognised incorrectly by the yes-filter (that is, to recognise the false positives of the yes-filter). By querying the no-filter after an object has been recognised by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognises as many as possible false
positives but no true positives, thus producing the most accurate yes-no Bloom filter among all yes-no Bloom filters. This paper studies how optimization techniques can
be used to maximize the number of false positives recognised by the no-filter, with the constraint being that it should recognise no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution.
Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of
a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX
built-in solver (B&B), and this is what can be recommended for use in yes-no Bloom filters. In a wider context of the study of lossy compression algorithms, our researchis an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data
Weightless: Lossy Weight Encoding For Deep Neural Network Compression
The large memory requirements of deep neural networks limit their deployment
and adoption on many devices. Model compression methods effectively reduce the
memory requirements of these models, usually through applying transformations
such as weight pruning or quantization. In this paper, we present a novel
scheme for lossy weight encoding which complements conventional compression
techniques. The encoding is based on the Bloomier filter, a probabilistic data
structure that can save space at the cost of introducing random errors.
Leveraging the ability of neural networks to tolerate these imperfections and
by re-training around the errors, the proposed technique, Weightless, can
compress DNN weights by up to 496x with the same model accuracy. This results
in up to a 1.51x improvement over the state-of-the-art
Opportunistic linked data querying through approximate membership metadata
Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface
Design of a multiple bloom filter for distributed navigation routing
Unmanned navigation of vehicles and mobile robots can be greatly simplified by providing environmental intelligence with dispersed wireless sensors. The wireless sensors can work as active landmarks for vehicle localization and routing. However, wireless sensors are often resource scarce and require a resource-saving design. In this paper, a multiple Bloom-filter scheme is proposed to compress a global routing table for a wireless sensor. It is used as a lookup table for routing a vehicle to any destination but requires significantly less memory space and search effort. An error-expectation-based design for a multiple Bloom filter is proposed as an improvement to the conventional false-positive-rate-based design. The new design is shown to provide an equal relative error expectation for all branched paths, which ensures a better network load balance and uses less memory space. The scheme is implemented in a project for wheelchair navigation using wireless camera motes. © 2013 IEEE
Delta bloom filter compression using stochastic learning-based weak estimation
Substantial research has been done, and sill continues, for reducing the bandwidth requirement and for reliable access to the data, stored and transmitted, in a space efficient manner. Bloom filters and their variants have achieved wide spread acceptability in various fields due to their ability to satisfy these requirements.
As this need has increased, especially, for the applications which require heavy use of the transmission bandwidth, distributed computing environment for the databases or the proxy servers, and even the applications which are sensitive to the access to the information with frequent modifications, this thesis proposes a solution in the form of compressed delta Bloom filter.
This thesis proposes delta Bloom filter compression, using stochastic learning-based weak estimation and prediction with partial matching to achieve the goal of lossless compression with high compression gain for reducing the large data transferred frequently
- …