7 research outputs found

    Accurate Counting Bloom Filters for Large-Scale Data Processing

    Get PDF
    Bloom filters are space-efficient randomized data structures for fast membership queries, allowing false positives. Counting Bloom Filters (CBFs) perform the same operations on dynamic sets that can be updated via insertions and deletions. CBFs have been extensively used in MapReduce to accelerate large-scale data processing on large clusters by reducing the volume of datasets. The false positive probability of CBF should be made as low as possible for filtering out more redundant datasets. In this paper, we propose a multilevel optimization approach to building an Accurate Counting Bloom Filter (ACBF) for reducing the false positive probability. ACBF is constructed by partitioning the counter vector into multiple levels. We propose an optimized ACBF by maximizing the first level size, in order to minimize the false positive probability while maintaining the same functionality as CBF. Simulation results show that the optimized ACBF reduces the false positive probability by up to 98.4% at the same memory consumption compared to CBF. We also implement ACBFs in MapReduce to speed up the reduce-side join. Experiments on realistic datasets show that ACBF reduces the false positive probability by 72.3% as well as the map outputs by 33.9% and improves the join execution times by 20% compared to CBF

    Accurate Counting Bloom Filters for Large-Scale Data Processing

    Get PDF
    Bloom filters are space-efficient randomized data structures for fast membership queries, allowing false positives. Counting Bloom Filters (CBFs) perform the same operations on dynamic sets that can be updated via insertions and deletions. CBFs have been extensively used in MapReduce to accelerate large-scale data processing on large clusters by reducing the volume of datasets. The false positive probability of CBF should be made as low as possible for filtering out more redundant datasets. In this paper, we propose a multilevel optimization approach to building an Accurate Counting Bloom Filter (ACBF) for reducing the false positive probability. ACBF is constructed by partitioning the counter vector into multiple levels. We propose an optimized ACBF by maximizing the first level size, in order to minimize the false positive probability while maintaining the same functionality as CBF. Simulation results show that the optimized ACBF reduces the false positive probability by up to 98.4% at the same memory consumption compared to CBF. We also implement ACBFs in MapReduce to speed up the reduce-side join. Experiments on realistic datasets show that ACBF reduces the false positive probability by 72.3% as well as the map outputs by 33.9% and improves the join execution times by 20% compared to CBF

    Hardware acceleration for power efficient deep packet inspection

    Get PDF
    The rapid growth of the Internet leads to a massive spread of malicious attacks like viruses and malwares, making the safety of online activity a major concern. The use of Network Intrusion Detection Systems (NIDS) is an effective method to safeguard the Internet. One key procedure in NIDS is Deep Packet Inspection (DPI). DPI can examine the contents of a packet and take actions on the packets based on predefined rules. In this thesis, DPI is mainly discussed in the context of security applications. However, DPI can also be used for bandwidth management and network surveillance. DPI inspects the whole packet payload, and due to this and the complexity of the inspection rules, DPI algorithms consume significant amounts of resources including time, memory and energy. The aim of this thesis is to design hardware accelerated methods for memory and energy efficient high-speed DPI. The patterns in packet payloads, especially complex patterns, can be efficiently represented by regular expressions, which can be translated by the use of Deterministic Finite Automata (DFA). DFA algorithms are fast but consume very large amounts of memory with certain kinds of regular expressions. In this thesis, memory efficient algorithms are proposed based on the transition compressions of the DFAs. In this work, Bloom filters are used to implement DPI on an FPGA for hardware acceleration with the design of a parallel architecture. Furthermore, devoted at a balance of power and performance, an energy efficient adaptive Bloom filter is designed with the capability of adjusting the number of active hash functions according to current workload. In addition, a method is given for implementation on both two-stage and multi-stage platforms. Nevertheless, false positive rates still prevents the Bloom filter from extensive utilization; a cache-based counting Bloom filter is presented in this work to get rid of the false positives for fast and precise matching. Finally, in future work, in order to estimate the effect of power savings, models will be built for routers and DPI, which will also analyze the latency impact of dynamic frequency adaption to current traffic. Besides, a low power DPI system will be designed with a single or multiple DPI engines. Results and evaluation of the low power DPI model and system will be produced in future

    MultiLayer compressed counting bloom filters

    No full text
    Bloom filters are efficient randomized data structures for membership queries on a set with a certain known false positive probability. Counting bloom filters (CBFs) allow the same operation on dynamic sets that can be updated via insertions and deletions with larger memory requirements. This paper first presents a new upper bound for counters overflow probability in CBFs. This bound is much tighter than that usually adopted in literature and it allows for designing more efficient CBFs. Three novel data structures are proposed, which introduce the idea of a hierarchical structure as well as the use of Huffman code. Our algorithms improve standard CBFs in terms of fast access and limited memory consumption (up to 50% of memory saving): the target could be the implementation of the compressed data structures in the small (but fast) local memory or "on-chip SRAM" of devices such as network processors

    Fast Packet Processing on High Performance Architectures

    Get PDF
    The rapid growth of Internet and the fast emergence of new network applications have brought great challenges and complex issues in deploying high-speed and QoS guaranteed IP network. For this reason packet classication and network intrusion detection have assumed a key role in modern communication networks in order to provide Qos and security. In this thesis we describe a number of the most advanced solutions to these tasks. We introduce NetFPGA and Network Processors as reference platforms both for the design and the implementation of the solutions and algorithms described in this thesis. The rise in links capacity reduces the time available to network devices for packet processing. For this reason, we show different solutions which, either by heuristic and randomization or by smart construction of state machine, allow IP lookup, packet classification and deep packet inspection to be fast in real devices based on high speed platforms such as NetFPGA or Network Processors

    Improving Group Integrity of Tags in RFID Systems

    Get PDF
    Checking the integrity of groups containing radio frequency identification (RFID) tagged objects or recovering the tag identifiers of missing objects is important in many activities. Several autonomous checking methods have been proposed for increasing the capability of recovering missing tag identifiers without external systems. This has been achieved by treating a group of tag identifiers (IDs) as packet symbols encoded and decoded in a way similar to that in binary erasure channels (BECs). Redundant data are required to be written into the limited memory space of RFID tags in order to enable the decoding process. In this thesis, the group integrity of passive tags in RFID systems is specifically targeted, with novel mechanisms being proposed to improve upon the current state of the art. Due to the sparseness property of low density parity check (LDPC) codes and the mitigation of the progressive edge-growth (PEG) method for short cycles, the research is begun with the use of the PEG method in RFID systems to construct the parity check matrix of LDPC codes in order to increase the recovery capabilities with reduced memory consumption. It is shown that the PEG-based method achieves significant recovery enhancements compared to other methods with the same or less memory overheads. The decoding complexity of the PEG-based LDPC codes is optimised using an improved hybrid iterative/Gaussian decoding algorithm which includes an early stopping criterion. The relative complexities of the improved algorithm are extensively analysed and evaluated, both in terms of decoding time and the number of operations required. It is demonstrated that the improved algorithm considerably reduces the operational complexity and thus the time of the full Gaussian decoding algorithm for small to medium amounts of missing tags. The joint use of the two decoding components is also adapted in order to avoid the iterative decoding when the missing amount is larger than a threshold. The optimum value of the threshold value is investigated through empirical analysis. It is shown that the adaptive algorithm is very efficient in decreasing the average decoding time of the improved algorithm for large amounts of missing tags where the iterative decoding fails to recover any missing tag. The recovery performances of various short-length irregular PEG-based LDPC codes constructed with different variable degree sequences are analysed and evaluated. It is demonstrated that the irregular codes exhibit significant recovery enhancements compared to the regular ones in the region where the iterative decoding is successful. However, their performances are degraded in the region where the iterative decoding can recover some missing tags. Finally, a novel protocol called the Redundant Information Collection (RIC) protocol is designed to filter and collect redundant tag information. It is based on a Bloom filter (BF) that efficiently filters the redundant tag information at the tag’s side, thereby considerably decreasing the communication cost and consequently, the collection time. It is shown that the novel protocol outperforms existing possible solutions by saving from 37% to 84% of the collection time, which is nearly four times the lower bound. This characteristic makes the RIC protocol a promising candidate for collecting redundant tag information in the group integrity of tags in RFID systems and other similar ones