6,079 research outputs found

    Retouched Bloom Filters: Allowing Networked Applications to Flexibly Trade Off False Positives Against False Negatives

    Full text link
    Where distributed agents must share voluminous set membership information, Bloom filters provide a compact, though lossy, way for them to do so. Numerous recent networking papers have examined the trade-offs between the bandwidth consumed by the transmission of Bloom filters, and the error rate, which takes the form of false positives, and which rises the more the filters are compressed. In this paper, we introduce the retouched Bloom filter (RBF), an extension that makes the Bloom filter more flexible by permitting the removal of selected false positives at the expense of generating random false negatives. We analytically show that RBFs created through a random process maintain an overall error rate, expressed as a combination of the false positive rate and the false negative rate, that is equal to the false positive rate of the corresponding Bloom filters. We further provide some simple heuristics and improved algorithms that decrease the false positive rate more than than the corresponding increase in the false negative rate, when creating RBFs. Finally, we demonstrate the advantages of an RBF over a Bloom filter in a distributed network topology measurement application, where information about large stop sets must be shared among route tracing monitors.Comment: This is a new version of the technical reports with improved algorithms and theorical analysis of algorithm

    An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters

    Get PDF
    Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes-no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognised incorrectly by the yes-filter (that is, to recognise the false positives of the yes-filter). By querying the no-filter after an object has been recognised by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognises as many as possible false positives but no true positives, thus producing the most accurate yes-no Bloom filter among all yes-no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognised by the no-filter, with the constraint being that it should recognise no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes-no Bloom filters. In a wider context of the study of lossy compression algorithms, our researchis an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data

    Weightless: Lossy Weight Encoding For Deep Neural Network Compression

    Get PDF
    The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding which complements conventional compression techniques. The encoding is based on the Bloomier filter, a probabilistic data structure that can save space at the cost of introducing random errors. Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors, the proposed technique, Weightless, can compress DNN weights by up to 496x with the same model accuracy. This results in up to a 1.51x improvement over the state-of-the-art

    Opportunistic linked data querying through approximate membership metadata

    Get PDF
    Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface

    Design of a multiple bloom filter for distributed navigation routing

    Get PDF
    Unmanned navigation of vehicles and mobile robots can be greatly simplified by providing environmental intelligence with dispersed wireless sensors. The wireless sensors can work as active landmarks for vehicle localization and routing. However, wireless sensors are often resource scarce and require a resource-saving design. In this paper, a multiple Bloom-filter scheme is proposed to compress a global routing table for a wireless sensor. It is used as a lookup table for routing a vehicle to any destination but requires significantly less memory space and search effort. An error-expectation-based design for a multiple Bloom filter is proposed as an improvement to the conventional false-positive-rate-based design. The new design is shown to provide an equal relative error expectation for all branched paths, which ensures a better network load balance and uses less memory space. The scheme is implemented in a project for wheelchair navigation using wireless camera motes. © 2013 IEEE

    Delta bloom filter compression using stochastic learning-based weak estimation

    Get PDF
    Substantial research has been done, and sill continues, for reducing the bandwidth requirement and for reliable access to the data, stored and transmitted, in a space efficient manner. Bloom filters and their variants have achieved wide spread acceptability in various fields due to their ability to satisfy these requirements. As this need has increased, especially, for the applications which require heavy use of the transmission bandwidth, distributed computing environment for the databases or the proxy servers, and even the applications which are sensitive to the access to the information with frequent modifications, this thesis proposes a solution in the form of compressed delta Bloom filter. This thesis proposes delta Bloom filter compression, using stochastic learning-based weak estimation and prediction with partial matching to achieve the goal of lossless compression with high compression gain for reducing the large data transferred frequently
    corecore