2,569 research outputs found

    Boosting XML Filtering with a Scalable FPGA-based Architecture

    Full text link
    The growing amount of XML encoded data exchanged over the Internet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML documents and a set of user subscriptions expressed as XML queries. The pub-sub system then filters the published documents and passes them to the subscribers. Pub-sub systems are characterized by very high input ratios, therefore the processing time is critical. In this paper we propose a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem. By utilizing the high throughput that an FPGA provides for parallel processing, our approach achieves drastically better throughput than the existing software or mixed (hardware/software) architectures. The XPath queries (subscriptions) are translated to regular expressions which are then mapped to FPGA devices. By introducing stacks within the FPGA we are able to express and process a wide range of path queries very efficiently, on a scalable environment. Moreover, the fact that the parser and the filter processing are performed on the same FPGA chip, eliminates expensive communication costs (that a multi-core system would need) thus enabling very fast and efficient pipelining. Our experimental evaluation reveals more than one order of magnitude improvement compared to traditional pub/sub systems.Comment: CIDR 200

    Multi-Level Pre-Correlation RFI Flagging for Real-Time Implementation on UniBoard

    Get PDF
    Because of the denser active use of the spectrum, and because of radio telescopes higher sensitivity, radio frequency interference (RFI) mitigation has become a sensitive topic for current and future radio telescope designs. Even if quite sophisticated approaches have been proposed in the recent years, the majority of RFI mitigation operational procedures are based on post-correlation corrupted data flagging. Moreover, given the huge amount of data delivered by current and next generation radio telescopes, all these RFI detection procedures have to be at least automatic and, if possible, real-time. In this paper, the implementation of a real-time pre-correlation RFI detection and flagging procedure into generic high-performance computing platforms based on Field Programmable Gate Arrays (FPGA) is described, simulated and tested. One of these boards, UniBoard, developed under a Joint Research Activity in the RadioNet FP7 European programme is based on eight FPGAs interconnected by a high speed transceiver mesh. It provides up to ~4 TMACs with Altera Stratix IV FPGA and 160 Gbps data rate for the input data stream. Considering the high in-out data rate in the pre-correlation stages, only real-time and go-through detectors (i.e. no iterative processing) can be implemented. In this paper, a real-time and adaptive detection scheme is described. An ongoing case study has been set up with the Electronic Multi-Beam Radio Astronomy Concept (EMBRACE) radio telescope facility at Nan\c{c}ay Observatory. The objective is to evaluate the performances of this concept in term of hardware complexity, detection efficiency and additional RFI metadata rate cost. The UniBoard implementation scheme is described.Comment: 16 pages, 13 figure

    MULTI-GIGABIT PATTERN FOR DATA IN NETWORK SECURITY

    Get PDF
    In the current scenario network security is emerging the world. Matching large sets of patterns against an incoming stream of data is a fundamental task in several fields such as network security or computational biology. High-speed network intrusion detection systems (IDS) rely on efficient pattern matching techniques to analyze the packet payload and make decisions on the significance of the packet body. However, matching the streaming payload bytes against thousands of patterns at multi-gigabit rates is computationally intensive. Various techniques have been proposed in past but the performance of the system is reducing because of multi-gigabit rates.Pattern matching is a significant issue in intrusion detection systems, but by no means the only one. Handling multi-content rules, reordering, and reassembling incoming packets are also significant for system performance. We present two pattern matching techniques to compare incoming packets against intrusion detection search patterns. The first approach, decoded partial CAM (DpCAM), pre-decodes incoming characters, aligns the decoded data, and performs logical AND on them to produce the match signal for each pattern. The second approach, perfect hashing memory (PHmem), uses perfect hashing to determine a unique memory location that contains the search pattern and a comparison between incoming data and memory output to determine the match. The suggested methods have implemented in vhdl coding and we use Xilinx for synthesis

    Programmable Hardware Accelerator For Regular Expression Queries

    Get PDF
    Regular expression (regex) queries are used extensively in data analytics applications. Hardware-based regex searches can make searches efficient across a variety of application domains. However, hardware accelerators that can support arbitrary regular expressions are currently infeasible due to the very large number of possible states and state transitions. This disclosure describes techniques to map an input regular expression to a non-deterministic finite automaton and hardware such as FPGA or ASIC that can be programmed to filter an input data stream to search for arbitrary (customer-given) regular expressions

    Techniques for low-overhead dynamic partial reconfiguration of FPGAs

    Get PDF

    Techniques for efficient regular expression matching across hardware architectures

    Get PDF
    Regular expression matching is a central task for many networking and bioinformatics applications. For example, network intrusion detection systems, which perform deep packet inspection to detect malicious network activities, often encode signatures of malicious traffic through regular expressions. Similarly, several bioinformatics applications perform regular expression matching to find common patterns, called motifs, across multiple gene or protein sequences. Hardware implementations of regular expression matching engines fall into two categories: memory-based and logic-based solutions. In both cases, the design aims to maximize the processing throughput and minimize the resources requirements, either in terms of memory or of logic cells. Graphical Processing Units (GPUs) offer a highly parallel platform for memory-based implementations, while Field Programmable Gate Arrays (FPGAs) support reconfigurable, logic-based solutions. In addition, Micron Technology has recently announced its Automata Processor, a memory-based, reprogrammable hardware device. From an algorithmic standpoint, regular expression matching engines are based on finite automata, either in their non-deterministic or in their deterministic form (NFA and DFA, respectively). Micron's Automata Processor is based on a proprietary Automata Network, which extends classical NFA with counters and boolean elements. In this work, we aim to implement highly parallel memory-based and logic-based regular expression matching solutions. Our contributions are summarized as follows. First, we implemented regular expression matching on GPU. In this process, we explored compression techniques and regular expression clustering algorithms to alleviate the memory pressure of DFA-based GPU implementations. Second, we developed a parser for Automata Networks defined through Micron's Automata Network Markup Language (ANML), a XML-based high-level language designed to program the Automata Processor. Specifically, our ANML parser first maps the Automata Networks to an

    A fully parameterized virtual coarse grained reconfigurable array for high performance computing applications

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have proven their potential in accelerating High Performance Computing (HPC) Applications. Conventionally such accelerators predominantly use, FPGAs that contain fine-grained elements such as LookUp Tables (LUTs), Switch Blocks (SB) and Connection Blocks (CB) as basic programmable logic blocks. However, the conventional implementation suffers from high reconfiguration and development costs. In order to solve this problem, programmable logic components are defined at a virtual higher abstraction level. These components are called Processing Elements (PEs) and the group of PEs along with the inter-connection network form an architecture called a Virtual Coarse-Grained Reconfigurable Array (VCGRA). The abstraction helps to reconfigure the PEs faster at the intermediate level than at the lower-level of an FPGA. Conventional VCGRA implementations (built on top of the lower levels of the FPGA) use functional resources such as LUTs to establish required connections (intra-connect) within a PE. In this paper, we propose to use the parameterized reconfiguration technique to implement the intra-connections of each PE with the aim to reduce the FPGA resource utilization (LUTs). The technique is used to parameterize the intra-connections with parameters that only change their value infrequently (whenever a new VCGRA function has to be reconfigured) and that are implemented as constants. Since the design is optimized for these constants at every moment in time, this reduces the resource utilization. Further, interconnections (network between the multiple PEs) of the VCGRA grid can also be parameterized so that both the inter- and intraconnect network of the VCGRA grid can be mapped onto the physical switch blocks of the FPGA. For every change in parameter values a specialized bitstream is generated on the fly and the FPGA is reconfigured using the parameterized run-time reconfiguration technique. Our results show a drastic reduction in FPGA LUT resource utilization in the PE by at least 30% and in the intra-network of the PE by 31% when implementing an HPC application
    • …
    corecore