4,769 research outputs found
Pay for a Sliding Bloom Filter and Get Counting, Distinct Elements, and Entropy for Free
For many networking applications, recent data is more significant than older
data, motivating the need for sliding window solutions. Various capabilities,
such as DDoS detection and load balancing, require insights about multiple
metrics including Bloom filters, per-flow counting, count distinct and entropy
estimation.
In this work, we present a unified construction that solves all the above
problems in the sliding window model. Our single solution offers a better space
to accuracy tradeoff than the state-of-the-art for each of these individual
problems! We show this both analytically and by running multiple real Internet
backbone and datacenter packet traces.Comment: To appear in IEEE INFOCOM 201
Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters
We introduce the straggler identification problem, in which an algorithm must
determine the identities of the remaining members of a set after it has had a
large number of insertion and deletion operations performed on it, and now has
relatively few remaining members. The goal is to do this in o(n) space, where n
is the total number of identities. The straggler identification problem has
applications, for example, in determining the set of unacknowledged packets in
a high-bandwidth multicast data stream. We provide a deterministic solution to
the straggler identification problem that uses only O(d log n) bits and is
based on a novel application of Newton's identities for symmetric polynomials.
This solution can identify any subset of d stragglers from a set of n O(log
n)-bit identifiers, assuming that there are no false deletions of identities
not already in the set. Indeed, we give a lower bound argument that shows that
any small-space deterministic solution to the straggler identification problem
cannot be guaranteed to handle false deletions. Nevertheless, we show that
there is a simple randomized solution using O(d log n log(1/epsilon)) bits that
can maintain a multiset and solve the straggler identification problem,
tolerating false deletions, where epsilon>0 is a user-defined parameter
bounding the probability of an incorrect response. This randomized solution is
based on a new type of Bloom filter, which we call the invertible Bloom filter.Comment: Fuller version of paper appearing in 10th Worksh. Algorithms and Data
Structures, Halifax, Nova Scotia, 200
Preventing DDoS using Bloom Filter: A Survey
Distributed Denial-of-Service (DDoS) is a menace for service provider and
prominent issue in network security. Defeating or defending the DDoS is a prime
challenge. DDoS make a service unavailable for a certain time. This phenomenon
harms the service providers, and hence, loss of business revenue. Therefore,
DDoS is a grand challenge to defeat. There are numerous mechanism to defend
DDoS, however, this paper surveys the deployment of Bloom Filter in defending a
DDoS attack. The Bloom Filter is a probabilistic data structure for membership
query that returns either true or false. Bloom Filter uses tiny memory to store
information of large data. Therefore, packet information is stored in Bloom
Filter to defend and defeat DDoS. This paper presents a survey on DDoS
defending technique using Bloom Filter.Comment: 9 pages, 1 figure. This article is accepted for publication in EAI
Endorsed Transactions on Scalable Information System
TinyLFU: A Highly Efficient Cache Admission Policy
This paper proposes to use a frequency based cache admission policy in order
to boost the effectiveness of caches subject to skewed access distributions.
Given a newly accessed item and an eviction candidate from the cache, our
scheme decides, based on the recent access history, whether it is worth
admitting the new item into the cache at the expense of the eviction candidate.
Realizing this concept is enabled through a novel approximate LFU structure
called TinyLFU, which maintains an approximate representation of the access
frequency of a large sample of recently accessed items. TinyLFU is very compact
and light-weight as it builds upon Bloom filter theory.
We study the properties of TinyLFU through simulations of both synthetic
workloads as well as multiple real traces from several sources. These
simulations demonstrate the performance boost obtained by enhancing various
replacement policies with the TinyLFU eviction policy. Also, a new combined
replacement and eviction policy scheme nicknamed W-TinyLFU is presented.
W-TinyLFU is demonstrated to obtain equal or better hit-ratios than other state
of the art replacement policies on these traces. It is the only scheme to
obtain such good results on all traces.Comment: A much earlier and shorter version of this work appeared in the
Euromicro PDP 2014 conferenc
Accelerating K-mer Frequency Counting with GPU and Non-Volatile Memory
The emergence of Next Generation Sequencing (NGS) platforms has increased the
throughput of genomic sequencing and in turn the amount of data that needs to
be processed, requiring highly efficient computation for its analysis. In this
context, modern architectures including accelerators and non-volatile memory
are essential to enable the mass exploitation of these bioinformatics
workloads. This paper presents a redesign of the main component of a
state-of-the-art reference-free method for variant calling, SMUFIN, which has
been adapted to make the most of GPUs and NVM devices. SMUFIN relies on
counting the frequency of \textit{k-mers} (substrings of length ) in DNA
sequences, which also constitutes a well-known problem for many bioinformatics
workloads, such as genome assembly. We propose techniques to improve the
efficiency of k-mer counting and to scale-up workloads like \sm that used to
require 16 nodes of \mn to a single machine with a GPU and NVM drives. Results
show that although the single machine is not able to improve the time to
solution of 16 nodes, its CPU time is 7.5x shorter than the aggregate CPU time
of the 16 nodes, with a reduction in energy consumption of 5.5x.Comment: Submitted to the 19th IEEE International Conference on high
Performance Computing and Communication (HPC 2017). Partially funded by
European Research Council (ERC) under the European Union's Horizon 2020
research and innovation programme (grant agreement No 639595) - HiEST Projec
Approximate Discovery of Service Nodes by Duplicate Detection in Flows
Knowledge about which nodes provide services is of critical importance for
network administrators. Discovery of service nodes can be done by making full
use of duplicate element detection in flows. Because the amount of traffic
across network is massive, especially in large ISPs or campus networks, we
propose an approximate algorithm with Round-robin Buddy Bloom Filters(RBBF) for
service detection using NetFlow data solely. The properties and analysis of
RBBF data structure are also given. Our method has better time/space efficiency
than conventional algorithm with a small false positive rate.%portion of false
positive. We also demonstrate the contributions through a prototype system by
real world case studies.Comment: 15 page
Set-Difference Range Queries
We introduce the problem of performing set-difference range queries, where
answers to queries are set-theoretic symmetric differences between sets of
items in two geometric ranges. We describe a general framework for answering
such queries based on a novel use of data-streaming sketches we call signed
symmetric-difference sketches. We show that such sketches can be realized using
invertible Bloom filters (IBFs), which can be composed, differenced, and
searched so as to solve set-difference range queries in a wide range of
scenarios
Advanced Bloom Filter Based Algorithms for Efficient Approximate Data De-Duplication in Streams
Applications involving telecommunication call data records, web pages, online
transactions, medical records, stock markets, climate warning systems, etc.,
necessitate efficient management and processing of such massively exponential
amount of data from diverse sources. De-duplication or Intelligent Compression
in streaming scenarios for approximate identification and elimination of
duplicates from such unbounded data stream is a greater challenge given the
real-time nature of data arrival. Stable Bloom Filters (SBF) addresses this
problem to a certain extent. .
In this work, we present several novel algorithms for the problem of
approximate detection of duplicates in data streams. We propose the Reservoir
Sampling based Bloom Filter (RSBF) combining the working principle of reservoir
sampling and Bloom Filters. We also present variants of the novel Biased
Sampling based Bloom Filter (BSBF) based on biased sampling concepts. We also
propose a randomized load balanced variant of the sampling Bloom Filter
approach to efficiently tackle the duplicate detection. In this work, we thus
provide a generic framework for de-duplication using Bloom Filters. Using
detailed theoretical analysis we prove analytical bounds on the false positive
rate, false negative rate and convergence rate of the proposed structures. We
exhibit that our models clearly outperform the existing methods. We also
demonstrate empirical analysis of the structures using real-world datasets (3
million records) and also with synthetic datasets (1 billion records) capturing
various input distributions.Comment: 41 page
Distributed Collaborative Monitoring in Software Defined Networks
We propose a Distributed and Collaborative Monitoring system, DCM, with the
following properties. First, DCM allow switches to collaboratively achieve flow
monitoring tasks and balance measurement load. Second, DCM is able to perform
per-flow monitoring, by which different groups of flows are monitored using
different actions. Third, DCM is a memory-efficient solution for switch data
plane and guarantees system scalability. DCM uses a novel two-stage Bloom
filters to represent monitoring rules using small memory space. It utilizes the
centralized SDN control to install, update, and reconstruct the two-stage Bloom
filters in the switch data plane. We study how DCM performs two representative
monitoring tasks, namely flow size counting and packet sampling, and evaluate
its performance. Experiments using real data center and ISP traffic data on
real network topologies show that DCM achieves highest measurement accuracy
among existing solutions given the same memory budget of switches
Don't Thrash: How to Cache Your Hash on Flash
This paper presents new alternatives to the well-known Bloom filter data
structure. The Bloom filter, a compact data structure supporting set insertion
and membership queries, has found wide application in databases, storage
systems, and networks. Because the Bloom filter performs frequent random reads
and writes, it is used almost exclusively in RAM, limiting the size of the sets
it can represent. This paper first describes the quotient filter, which
supports the basic operations of the Bloom filter, achieving roughly comparable
performance in terms of space and time, but with better data locality.
Operations on the quotient filter require only a small number of contiguous
accesses. The quotient filter has other advantages over the Bloom filter: it
supports deletions, it can be dynamically resized, and two quotient filters can
be efficiently merged. The paper then gives two data structures, the buffered
quotient filter and the cascade filter, which exploit the quotient filter
advantages and thus serve as SSD-optimized alternatives to the Bloom filter.
The cascade filter has better asymptotic I/O performance than the buffered
quotient filter, but the buffered quotient filter outperforms the cascade
filter on small to medium data sets. Both data structures significantly
outperform recently-proposed SSD-optimized Bloom filter variants, such as the
elevator Bloom filter, buffered Bloom filter, and forest-structured Bloom
filter. In experiments, the cascade filter and buffered quotient filter
performed insertions 8.6-11 times faster than the fastest Bloom filter variant
and performed lookups 0.94-2.56 times faster.Comment: VLDB201
- …