10 research outputs found

    PaREM: A Novel Approach for Parallel Regular Expression Matching

    Full text link
    Regular expression matching is essential for many applications, such as finding patterns in text, exploring substrings in large DNA sequences, or lexical analysis. However, sequential regular expression matching may be time-prohibitive for large problem sizes. In this paper, we describe a novel algorithm for parallel regular expression matching via deterministic finite automata. Furthermore, we present our tool PaREM that accepts regular expressions and finite automata as input and automatically generates the corresponding code for our algorithm that is amenable for parallel execution on shared-memory systems. We evaluate our parallel algorithm empirically by comparing it with a commonly used algorithm for sequential regular expression matching. Experiments on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 21x for 48 threads.Comment: CSE-2014, Dec. 19th - 21st, 2014, Chengdu, Sichuan, Chin

    Techniques for efficient regular expression matching across hardware architectures

    Get PDF
    Regular expression matching is a central task for many networking and bioinformatics applications. For example, network intrusion detection systems, which perform deep packet inspection to detect malicious network activities, often encode signatures of malicious traffic through regular expressions. Similarly, several bioinformatics applications perform regular expression matching to find common patterns, called motifs, across multiple gene or protein sequences. Hardware implementations of regular expression matching engines fall into two categories: memory-based and logic-based solutions. In both cases, the design aims to maximize the processing throughput and minimize the resources requirements, either in terms of memory or of logic cells. Graphical Processing Units (GPUs) offer a highly parallel platform for memory-based implementations, while Field Programmable Gate Arrays (FPGAs) support reconfigurable, logic-based solutions. In addition, Micron Technology has recently announced its Automata Processor, a memory-based, reprogrammable hardware device. From an algorithmic standpoint, regular expression matching engines are based on finite automata, either in their non-deterministic or in their deterministic form (NFA and DFA, respectively). Micron's Automata Processor is based on a proprietary Automata Network, which extends classical NFA with counters and boolean elements. In this work, we aim to implement highly parallel memory-based and logic-based regular expression matching solutions. Our contributions are summarized as follows. First, we implemented regular expression matching on GPU. In this process, we explored compression techniques and regular expression clustering algorithms to alleviate the memory pressure of DFA-based GPU implementations. Second, we developed a parser for Automata Networks defined through Micron's Automata Network Markup Language (ANML), a XML-based high-level language designed to program the Automata Processor. Specifically, our ANML parser first maps the Automata Networks to an

    Curing Regular Expressions Matching Algorithms from Insomnia, Amnesia, and Acalculia

    Get PDF
    The importance of network security has grown tremendously and a collection of devices have been introduced, which can improve the security of a network. Network intrusion detection systems (NIDS) are among the most widely deployed such system; popular NIDS use a collection of signatures of known security threats and viruses, which are used to scan each packet’s payload. Today, signatures are often specified as regular expressions; thus the core of the NIDS comprises of a regular expressions parser; such parsers are traditionally implemented as finite automata. Deterministic Finite Automata (DFA) are fast, therefore they are often desirable at high network link rates. DFA for the signatures, which are used in the current security devices, however require prohibitive amounts of memory, which limits their practical use. In this paper, we argue that the traditional DFA based NIDS has three main limitations: first they fail to exploit the fact that normal data streams rarely match any virus signature; second, DFAs are extremely inefficient in following multiple partially matching signatures and explodes in size, and third, finite automaton are incapable of efficiently keeping track of counts. We propose mechanisms to solve each of these drawbacks and demonstrate that our solutions can implement a NIDS much more securely and economically, and at the same time substantially improve the packet throughput

    Design of Pattern Matching Systems: Pattern, Algorithm, and Scanner

    Get PDF
    Pattern matching is at the core of many computational problems, e.g., search engine, data mining, network security and information retrieval. In this dissertation, we target at the more complex patterns of regular expression and time series, and proposed a general modular structure, named character class with constraint repetition (CCR), as the building block for the pattern matching algorithm. An exact matching algorithm named MIN-MAX is developed to support overlapped matching of CCR based regexps, and an approximate matching algorithm named Elastic Matching Algorithm is designed to support overlapped matching of CCR based time series, i.e., music melody. Both algorithms are parallelized to run on FPGA to achieve high performance, and the FPGA-based scanners are designed as a modular architecture which is parameterizable and can be reconfigured by simple memory writes, achieving a perfect balance between performance and deployment time

    High performance stride-based network payload inspection

    Get PDF
    There are two main drivers for network payload inspection: malicious data, attacks, virus detection in Network Intrusion Detection System (NIDS) and content detection in Data Leakage Prevention System (DLPS) or Copyright Infringement Detection System (CIDS). Network attacks are getting more and more prevalent. Traditional network firewalls can only check the packet header, but fail to detect attacks hidden in the packet payload. Therefore, the NIDS with Deep Packet Inspection (DPI) function has been developed and widely deployed. By checking each byte of a packet against the pattern set, which is called pattern matching, NIDS is able to detect the attack codes hidden in the payload. The pattern set is usually organized as a Deterministic Finite Automata (DFA). The processing time of DFA is proportional to the length of the input string, but the memory cost of a DFA is quite large. Meanwhile, the link bandwidth and the traffic of the Internet are rapidly increasing, the size of the attack signature database is also growing larger and larger due to the diversification of the attacks. Consequently, there is a strong demand for high performance and low storage cost NIDS. Traditional softwarebased and hardware-based pattern matching algorithms are have difficulty satisfying the processing speed requirement, thus high performance network payload inspection methods are needed to enable deep packet inspection at line rate. In this thesis, Stride Finite Automata (StriFA), a novel finite automata family to accelerate both string matching and regular expression matching, is presented. Compared with the conventional finite automata, which scan the entire traffic stream to locate malicious information, the StriFA only needs to scan samples of the traffic stream to find the suspicious information, thus increasing the matching speed and reducing memory requirements. Technologies such as instant messaging software (Skype, MSN) or BitTorrent file sharing methods, allow convenient sharing of information between managers, employees, customers, and partners. This, however, leads to two kinds of major security risks when exchanging data between different people: firstly, leakage of sensitive data from a company and, secondly, distribution of copyright infringing products in Peer to Peer (P2P) networks. Traditional DFA-based DPI solutions cannot be used for inspection of file distribution in P2P networks due to the potential out-of-order manner of the data delivery. To address this problem, a hybrid finite automaton called Skip-Stride-Neighbor Finite Automaton (S2NFA) is proposed to solve this problem. It combines benefits of the following three structures: 1) Skip-FA, which is used to solve the out-of-order data scanning problem; 2) Stride-DFA, which is introduced to reduce the memory usage of Skip-FA; 3) Neighbor-DFA which is based on the characteristics of Stride-DFA to get a low false positive rate at the additional cost of a small increase in memory consumption

    Hardware acceleration for power efficient deep packet inspection

    Get PDF
    The rapid growth of the Internet leads to a massive spread of malicious attacks like viruses and malwares, making the safety of online activity a major concern. The use of Network Intrusion Detection Systems (NIDS) is an effective method to safeguard the Internet. One key procedure in NIDS is Deep Packet Inspection (DPI). DPI can examine the contents of a packet and take actions on the packets based on predefined rules. In this thesis, DPI is mainly discussed in the context of security applications. However, DPI can also be used for bandwidth management and network surveillance. DPI inspects the whole packet payload, and due to this and the complexity of the inspection rules, DPI algorithms consume significant amounts of resources including time, memory and energy. The aim of this thesis is to design hardware accelerated methods for memory and energy efficient high-speed DPI. The patterns in packet payloads, especially complex patterns, can be efficiently represented by regular expressions, which can be translated by the use of Deterministic Finite Automata (DFA). DFA algorithms are fast but consume very large amounts of memory with certain kinds of regular expressions. In this thesis, memory efficient algorithms are proposed based on the transition compressions of the DFAs. In this work, Bloom filters are used to implement DPI on an FPGA for hardware acceleration with the design of a parallel architecture. Furthermore, devoted at a balance of power and performance, an energy efficient adaptive Bloom filter is designed with the capability of adjusting the number of active hash functions according to current workload. In addition, a method is given for implementation on both two-stage and multi-stage platforms. Nevertheless, false positive rates still prevents the Bloom filter from extensive utilization; a cache-based counting Bloom filter is presented in this work to get rid of the false positives for fast and precise matching. Finally, in future work, in order to estimate the effect of power savings, models will be built for routers and DPI, which will also analyze the latency impact of dynamic frequency adaption to current traffic. Besides, a low power DPI system will be designed with a single or multiple DPI engines. Results and evaluation of the low power DPI model and system will be produced in future

    Energy Efficient Hardware Accelerators for Packet Classification and String Matching

    Get PDF
    This thesis focuses on the design of new algorithms and energy efficient high throughput hardware accelerators that implement packet classification and fixed string matching. These computationally heavy and memory intensive tasks are used by networking equipment to inspect all packets at wire speed. The constant growth in Internet usage has made them increasingly difficult to implement at core network line speeds. Packet classification is used to sort packets into different flows by comparing their headers to a list of rules. A flow is used to decide a packet’s priority and the manner in which it is processed. Fixed string matching is used to inspect a packet’s payload to check if it contains any strings associated with known viruses, attacks or other harmful activities. The contributions of this thesis towards the area of packet classification are hardware accelerators that allow packet classification to be implemented at core network line speeds when classifying packets using rulesets containing tens of thousands of rules. The hardware accelerators use modified versions of the HyperCuts packet classification algorithm. An adaptive clocking unit is also presented that dynamically adjusts the clock speed of a packet classification hardware accelerator so that its processing capacity matches the processing needs of the network traffic. This keeps dynamic power consumption to a minimum. Contributions made towards the area of fixed string matching include a new algorithm that builds a state machine that is used to search for strings with the aid of default transition pointers. The use of default transition pointers keep memory consumption low, allowing state machines capable of searching for thousands of strings to be small enough to fit in the on-chip memory of devices such as FPGAs. A hardware accelerator is also presented that uses these state machines to search through the payloads of packets for strings at core network line speeds

    Fast Packet Processing on High Performance Architectures

    Get PDF
    The rapid growth of Internet and the fast emergence of new network applications have brought great challenges and complex issues in deploying high-speed and QoS guaranteed IP network. For this reason packet classication and network intrusion detection have assumed a key role in modern communication networks in order to provide Qos and security. In this thesis we describe a number of the most advanced solutions to these tasks. We introduce NetFPGA and Network Processors as reference platforms both for the design and the implementation of the solutions and algorithms described in this thesis. The rise in links capacity reduces the time available to network devices for packet processing. For this reason, we show different solutions which, either by heuristic and randomization or by smart construction of state machine, allow IP lookup, packet classification and deep packet inspection to be fast in real devices based on high speed platforms such as NetFPGA or Network Processors
    corecore