Search CORE

18 research outputs found

FPGA-based High Throughput Regular Expression Pattern Matching for Network Intrusion Detection Systems

Author: Modi Bala
Publication venue
Publication date: 01/02/2015
Field of study

Network speeds and bandwidths have improved over time. However, the frequency of network attacks and illegal accesses have also increased as the network speeds and bandwidths improved over time. Such attacks are capable of compromising the privacy and confidentiality of network resources belonging to even the most secure networks. Currently, general-purpose processor based software solutions used for detecting network attacks have become inadequate in coping with the current network speeds. Hardware-based platforms are designed to cope with the rising network speeds measured in several gigabits per seconds (Gbps). Such hardware-based platforms are capable of detecting several attacks at once, and a good candidate is the Field-programmable Gate Array (FPGA). The FPGA is a hardware platform that can be used to perform deep packet inspection of network packet contents at high speed. As such, this thesis focused on studying designs that were implemented with Field-programmable Gate Arrays (FPGAs). Furthermore, all the FPGA-based designs studied in this thesis have attempted to sustain a more steady growth in throughput and throughput efficiency. Throughput efficiency is defined as the concurrent throughput of a regular expression matching engine circuit divided by the average number of look up tables (LUTs) utilised by each state of the engine"s automata. The implemented FPGA-based design was built upon the concept of equivalence classification. The concept helped to reduce the overall table size of the inputs needed to drive the various Nondeterministic Finite Automata (NFA) matching engines. Compared with other approaches, the design sustained a throughput of up to 11.48 Gbps, and recorded an overall reduction in the number of pattern matching engines required by up to 75%. Also, the overall memory required by the design was reduced by about 90% when synthesised on the target FPGA platform

Kent Academic Repository

Accelerating Regular-Expression Matching on FPGAs with High-Level Synthesis

Author: Callanan Devon
Publication venue
Publication date: 13/06/2021
Field of study

The importance of security infrastructures for high-throughput networks has rapidly grown as a result of expanding internet traffic and increasingly high-bandwidth connections. Intrusion-detection systems (IDSs), such as SNORT, rely upon rule sets designed to alert system administrators of malicious packets. Methods for deep-packet inspection, which often depend upon regular-expression searches, can be accelerated on programmable-logic (PL) architectures using non-deterministic finite automata (NFAs). Prior designs have relied upon register-transfer level (RTL) design descriptions and have achieved efficient resource utilization through fine-grained optimizations. New advances made by field-programmable gate array (FPGA) vendors have led to powerful compiler toolchains for OpenCL and SYCL that allow for rapid development on PL architectures while generating competitive designs in terms of performance. The goal of this research is to evaluate performance differences between a custom, SYCL- and OpenCL-based, acceleration architecture for regular expressions and comparable RTL-based designs. The simplicity of the application, which requires only basic hardware building blocks, adds to the novelty of the comparison. In contrast to prior RTL-based solutions, which show frequency degradation with bandwidth scaling, this approach is able to maintain stable and high operating frequencies at the cost of resource usage. By scaling input bandwidth with multi-character transformations, high-throughput designs can be realized.Using Intel's OpenCL compiler, throughputs in excess of 17 Gbps can be achieved on Intel’s Arria 10 Programmable Acceleration Card and 19.4 Gbps with Intel's Stratix 10 Programmable Acceleration Card, outperforming similar designs with RTL, as reported in the literature. SYCL-based designs, synthesized with Intel's oneAPI compiler show performance degradation but still achieve higher throughput, up to 15.6 Gbps, than past RTL-based implementations. Overall, OpenCL and SYCL development yields both competitive results, when compared to the fine-grained RTL development process, and many ease-of-use improvements and design abstractions

D-Scholarship@Pitt

Techniques for efficient regular expression matching across hardware architectures

Author: Wang Xiang (Computer engineer)
Publication venue: University of Missouri--Columbia
Publication date
Field of study

Regular expression matching is a central task for many networking and bioinformatics applications. For example, network intrusion detection systems, which perform deep packet inspection to detect malicious network activities, often encode signatures of malicious traffic through regular expressions. Similarly, several bioinformatics applications perform regular expression matching to find common patterns, called motifs, across multiple gene or protein sequences. Hardware implementations of regular expression matching engines fall into two categories: memory-based and logic-based solutions. In both cases, the design aims to maximize the processing throughput and minimize the resources requirements, either in terms of memory or of logic cells. Graphical Processing Units (GPUs) offer a highly parallel platform for memory-based implementations, while Field Programmable Gate Arrays (FPGAs) support reconfigurable, logic-based solutions. In addition, Micron Technology has recently announced its Automata Processor, a memory-based, reprogrammable hardware device. From an algorithmic standpoint, regular expression matching engines are based on finite automata, either in their non-deterministic or in their deterministic form (NFA and DFA, respectively). Micron's Automata Processor is based on a proprietary Automata Network, which extends classical NFA with counters and boolean elements. In this work, we aim to implement highly parallel memory-based and logic-based regular expression matching solutions. Our contributions are summarized as follows. First, we implemented regular expression matching on GPU. In this process, we explored compression techniques and regular expression clustering algorithms to alleviate the memory pressure of DFA-based GPU implementations. Second, we developed a parser for Automata Networks defined through Micron's Automata Network Markup Language (ANML), a XML-based high-level language designed to program the Automata Processor. Specifically, our ANML parser first maps the Automata Networks to an

University of Missouri: MOspace

Recommended from our members

Securing Network Processors with Hardware Monitors

Author: Hu Kekai
Publication venue: ScholarWorks@UMass Amherst
Publication date: 09/11/2015
Field of study

As an essential part of modern society, the Internet has fundamentally changed our lives during the last decade. Novel applications and technologies, such as online shopping, social networking, cloud computing, mobile networking, etc, have sprung up at an astonishing pace. These technologies not only influence modern life styles but also impact Internet infrastructure. Numerous new network applications and services require better programmability and flexibility for network devices, such as routers and switches. Since traditional fixed function network routers based on application specific integrated circuits (ASICs) have difficulty keeping pace with the growing demands of next-generation Internet applications, there is an ongoing shift in the industry toward implementing network devices using programmable network processors (NPs). While network processors offer great benefits in terms of flexibility, their reprogrammable nature exposes potential security risks. Similar to network end-systems, such as general-purpose computers, software-based network processors have security vulnerabilities that can be attacked remotely. Recent research has shown that a new type of data plane attack is able to modify the functionality of a network processor and cause a denial-of-service (DoS) attack by sending a single malformed UDP packet. Since this attack relies solely on data plane access and does not need access to the control plane, it can be particularly difficult to control. Hardware security monitors have been introduced to identify and eliminate these malicious packets before they can propagate and cause devastating effects in the network. However, previous work on hardware monitors only focus on single core systems with static (or very slowly changing) workloads. In network processors that use up to hundreds of parallel processor cores and have processing workloads that can change dynamically based on the network traffic, the realization of a complete multicore hardware monitoring system remains a critical challenge. Our research work in this thesis provides a comprehensive solution to this problem. Our first contribution is the design and prototype implementation of a Scalable Hardware Monitoring Grid (SHMG). This scalable architecture balances area cost and performance overhead by using a clustered approach for multicore NP systems. In order to adapt to dynamically changing network traffic, a resource reallocation algorithm is designed to reassign the processing resources in SHMG to different network applications at runtime. An evaluation of the prototype SHMG on an Altera DE4 board demonstrates low resource and performance overheads. The functionality and performance of a runtime resource reallocation algorithm are tested using a simulation environment. A second significant contribution of this work is a network system-level security solution for multicore network processors with hardware monitors. It addresses two key problems: (1) how to securely manage and reprogram processor cores and monitors in a deployed router in the network, and (2) how to prevent the large number of identical router devices in the network from an attack that can circumvent one specific monitoring system and lead to Internet-scale failures. A Secure Dynamic Multicore Hardware Monitoring System (SDMMon) is designed based on cryptographic principles and suitable key management to ensure the secure installation of processor binaries and monitor graphs. We present a Merkle tree based parameterizable high performance hash function that can be configured to perform a variety of functions in different devices via a 32-bit configuration parameter. A prototype system composed of both the SDMMon and the parameterizable hash is implemented and evaluated on an Altera DE4 board. Finally, a fully-functional, comprehensive Multicore NP Security Platform, which integrates both the SHMG and the SDMMon security features, has been implemented on an Altera DE5 board

ScholarWorks@UMass Amherst

Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

Author: Angstadt Kevin
Publication venue
Publication date
Field of study

The adoption of hardware accelerators, such as Field-Programmable Gate Arrays, into general-purpose computation pipelines continues to rise, driven by recent trends in data collection and analysis as well as pressure from challenging physical design constraints in hardware. The architectural designs of many of these accelerators stand in stark contrast to the traditional von Neumann model of CPUs. Consequently, existing programming languages, maintenance tools, and techniques are not directly applicable to these devices, meaning that additional architectural knowledge is required for effective programming and configuration. Current programming models and techniques are akin to assembly-level programming on a CPU, thus placing significant burden on developers tasked with using these architectures. Because programming is currently performed at such low levels of abstraction, the software development process is tedious and challenging and hinders the adoption of hardware accelerators. This dissertation explores the thesis that theoretical finite automata provide a suitable abstraction for bridging the gap between high-level programming models and maintenance tools familiar to developers and the low-level hardware representations that enable high-performance execution on hardware accelerators. We adopt a principled hardware/software co-design methodology to develop a programming model providing the key properties that we observe are necessary for success, namely performance and scalability, ease of use, expressive power, and legacy support. First, we develop a framework that allows developers to port existing, legacy code to run on hardware accelerators by leveraging automata learning algorithms in a novel composition with software verification, string solvers, and high-performance automata architectures. Next, we design a domain-specific programming language to aid programmers writing pattern-searching algorithms and develop compilation algorithms to produce finite automata, which supports efficient execution on a wide variety of processing architectures. Then, we develop an interactive debugger for our new language, which allows developers to accurately identify the locations of bugs in software while maintaining support for high-throughput data processing. Finally, we develop two new automata-derived accelerator architectures to support additional applications, including the detection of security attacks and the parsing of recursive and tree-structured data. Using empirical studies, logical reasoning, and statistical analyses, we demonstrate that our prototype artifacts scale to real-world applications, maintain manageable overheads, and support developers' use of hardware accelerators. Collectively, the research efforts detailed in this dissertation help ease the adoption and use of hardware accelerators for data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd

Deep Blue Documents at the University of Michigan

Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

Author: Becker Jürgen
Hübner Michael
Lagadec Loïc
Sander Oliver
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2010
Field of study

ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

KITopen

Tools and Algorithms for the Construction and Analysis of Systems:24th International Conference, TACAS 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings, Part II

Author
Publication venue: Springer
Publication date: 01/01/2018
Field of study

University of Twente Research Information

Approche générative conjointe logicielle-matérielle au développement du support protocolaire d’applications réseaux

Author: Solanki Jigar
Publication venue: HAL CCSD
Publication date: 27/11/2014
Field of study

Communications between network applications is achieved by using rulesets known as protocols. Protocol messages are managed by the application layer known as the protocol parsing layer or protocol handling layer. Protocol parsers are coded in software, in hardware or based on a co-design approach. They represent the interface between the application logic and the outside world. Thus, they are critical components of network applications. Global performances of network applications are directly linked to the performances of their protocol parser layers.Developping protocol parsers consists of translating protocol specifications, written in a high level language such as ABNF towards low level software or hardware code. As the use of embedded systems is growing, hardware ressources become more and more available to applications on systems on chip (SoC). Nonetheless, developping a network application that uses hardware ressources is challenging, requiring not only expertise in hardware design, but also a knowledge of the protocols involved and an understanding of low-level network programming.This thesis proposes a generative hardware-software co-design based approach to the developpement of network protocol message parsers, to improve their performances without increasing the expertise the developper may need. Our approach is based on a dedicated language, called Zebra, that generates both hardware and software elements that compose protocol parsers. The necessary expertise is deported in the use of the Zebra language and the generated hardware components permit to improve global performances.The contributions of this thesis are as follows : We provide an analysis of network protocols and applications. This analysis allows us to detect the elements which performances can be improved using hardware ressources. We present the domain specific language Zebra to describe protocol handling layers. Software and hardware components are then generated according to Zebra specifications. We have built a SoC running a Linux operating system to assess our approach.We have designed hardware accelerators for different network protocols that are deployed and driven by applications. To increase sharing of parsing units between several tasks, we have developped a middleware that seamlessly manages all the accesses to the hardware components. The Zebra middleware allows several clients to access the ressources of a hardware accelerator. We have conducted several set of experiments in real conditions. We have compared the performances of our approach with the performances of well-knownprotocol handling layers. We observe that protocol handling layers baded on our approach are more efficient that existing approaches.Les communications entre les applications réseaux sont régies par un ensemble de règles regroupées sous forme de protocoles. Les messages protocolaires sont gérés par une couche de l’application réseau connue comme étant la couche de support protocolaire. Cette couche peut être de nature logicielle, matérielle ou conjointe. Cette couche se trouve à la frontière entre le coeur de l’application et le monde extérieur. A ce titre, elle représente un composant névralgique de l’application. Les performances globales de l’application sont ainsi directement liées aux performances de la couche de support protocolaire associée.Le processus de développement de ces couches consiste à traduire une spécification du protocole, écrite dans un langage de haut niveau tel que ABNF dans un langage bas niveau, logiciel ou matériel. Avec l’avènement des systèmes embarqués, de plus en plus de systèmes sur puce proposent l’utilisation de ressources matérielles afin d’accroître les performances des applicatifs. Néanmoins, peu de processus de développement de couches de support protocolaire tirent parti de ces ressources, en raison notamment de l’expertise nécessaire dans ce domaine.Cette thèse propose une approche générative conjointe logicielle-matérielle au développement du support protocolaire d’applications réseaux, pour améliorer leur performance tout en restant ergonomique pour le développeur de l’application. Notre approche est basée sur l’exploitation d’un langage dédié, appellé Zebra pour générer les différents composants logiciels et matériels formant la couche de support. L’expertise nécessaire est déportée dans l’utilisation du langage Zebra et les composants matériels générés permettent d’accroître les performances de l’application.Les contributions de cette thèse sont les suivantes : Nous avons effectué une analyse des protocoles et applications réseaux. Cette analyse nous a permis d’identifier les composants pour lesquels il est possible d’obtenir des gains de performances.Nous avons conçu et exploité un langage dédié, Zebra, permettant de décrire les différentes entités de la couche de support protocolaire et générant les éléments logiciels et matériels la composant. Nous avons construit un système sur puce exécutant un système d’exploitation Linux afin d’étayer notre approche. Nous avons conçu des accélérateurs matériels déployables pour différents protocoles réseaux sur ce système et pilotables par les applicatifs. Afin de rendre l’accès aux accélérateurs matériels transparent pour les applications réseaux, nous avons développé un intergiciel gérant l’ensemble de ces accès. Cet intergiciel permet à plusieurs applications et/ou à plusieurs clients d’une même application d’utiliser les accélérateurs pour le traitement des messages protocolaires. Nous avons évalué les performances de notre approche dans des conditions réelles. Nous avons comparé ces performances à celles de couches de supports faisant référence dans le domaine. Nous avons constaté un gain de performance conséquent pour l’approche que nous proposons

Thèses en Ligne

SPL: An extensible language for distributed stream processing

Author: Gedik B.
Hirzel M.
Schneider S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Big data is revolutionizing how all sectors of our economy do business, including telecommunication, transportation, medical, and finance. Big data comes in two flavors: data at rest and data in motion. Processing data in motion is stream processing. Stream processing for big data analytics often requires scale that can only be delivered by a distributed system, exploiting parallelism on many hosts and many cores. One such distributed stream processing system is IBM Streams. Early customer experience with IBM Streams uncovered that another core requirement is extensibility, since customers want to build high-performance domain-specific operators for use in their streaming applications. Based on these two core requirements of distribution and extensibility, we designed and implemented the Streams Processing Language (SPL). This article describes SPL with an emphasis on the language design, distributed runtime, and extensibility mechanism. SPL is now the gateway for the IBM Streams platform, used by our customers for stream processing in a broad range of application domains. © 2017 ACM

Bilkent University Institutional Repository