Search CORE

29 research outputs found

On the bit-parallel simulation of the nondeterministic Aho-Corasick and suffix automata for a set of patterns

Author: Domenico Cantone
Emanuele Giaquinta
Simone Faro
Publication venue
Publication date: 01/02/2012
Field of study

In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet @S of size @s. Our algorithms, named Log-And and Backward-Log-And, require O(([email protected])@?m/[email protected]?)-space, and work in O([email protected]?m/[email protected]?) and O([email protected]?m/[email protected]?l"m"i"n) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and l"m"i"n is the length of the shortest pattern in P

Elsevier - Publisher Connector

Open Access Repository

A Survey of String Matching Algorithms

Author: Koloud Al-Khamaiseh
Shadi Alshagarin
Publication venue
Publication date: 03/04/2020
Field of study

ABSTRACT The concept of string matching algorithms are playing an important role of string algorithms in finding a place where one or several strings (patterns) are found in a large body of text (e.g., data streaming, a sentence, a paragraph, a book, etc.). Its application covers a wide range, including intrusion detection Systems (IDS) in computer networks, applications in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. In this paper we present a short survey for well-known and recent updated and hybrid string matching algorithms. These algorithms can be divided into two major categories, known as exact string matching and approximate string matching. The string matching classification criteria was selected to highlight important features of matching strategies, in order to identify challenges and vulnerabilities

CiteSeerX

Real-time Regular Expression Matching

Author: Bernadotte Alexandra
Publication venue
Publication date: 20/08/2023
Field of study

This paper is devoted to finite state automata, regular expression matching, pattern recognition, and the exponential blow-up problem, which is the growing complexity of automata exponentially depending on regular expression length. This paper presents a theoretical and hardware solution to the exponential blow-up problem for some complicated classes of regular languages, which caused severe limitations in Network Intrusion Detection Systems work. The article supports the solution with theorems on correctness and complexity.Comment: 17 pages, 11 figure

arXiv.org e-Print Archive

High performance stride-based network payload inspection

Author: Wang Xiaofei
Publication venue: Dublin City University. Research Institute for Networks and Communications Engineering (RINCE)
Publication date: 01/11/2012
Field of study

There are two main drivers for network payload inspection: malicious data, attacks, virus detection in Network Intrusion Detection System (NIDS) and content detection in Data Leakage Prevention System (DLPS) or Copyright Infringement Detection System (CIDS). Network attacks are getting more and more prevalent. Traditional network firewalls can only check the packet header, but fail to detect attacks hidden in the packet payload. Therefore, the NIDS with Deep Packet Inspection (DPI) function has been developed and widely deployed. By checking each byte of a packet against the pattern set, which is called pattern matching, NIDS is able to detect the attack codes hidden in the payload. The pattern set is usually organized as a Deterministic Finite Automata (DFA). The processing time of DFA is proportional to the length of the input string, but the memory cost of a DFA is quite large. Meanwhile, the link bandwidth and the traffic of the Internet are rapidly increasing, the size of the attack signature database is also growing larger and larger due to the diversification of the attacks. Consequently, there is a strong demand for high performance and low storage cost NIDS. Traditional softwarebased and hardware-based pattern matching algorithms are have difficulty satisfying the processing speed requirement, thus high performance network payload inspection methods are needed to enable deep packet inspection at line rate. In this thesis, Stride Finite Automata (StriFA), a novel finite automata family to accelerate both string matching and regular expression matching, is presented. Compared with the conventional finite automata, which scan the entire traffic stream to locate malicious information, the StriFA only needs to scan samples of the traffic stream to find the suspicious information, thus increasing the matching speed and reducing memory requirements. Technologies such as instant messaging software (Skype, MSN) or BitTorrent file sharing methods, allow convenient sharing of information between managers, employees, customers, and partners. This, however, leads to two kinds of major security risks when exchanging data between different people: firstly, leakage of sensitive data from a company and, secondly, distribution of copyright infringing products in Peer to Peer (P2P) networks. Traditional DFA-based DPI solutions cannot be used for inspection of file distribution in P2P networks due to the potential out-of-order manner of the data delivery. To address this problem, a hybrid finite automaton called Skip-Stride-Neighbor Finite Automaton (S2NFA) is proposed to solve this problem. It combines benefits of the following three structures: 1) Skip-FA, which is used to solve the out-of-order data scanning problem; 2) Stride-DFA, which is introduced to reduce the memory usage of Skip-FA; 3) Neighbor-DFA which is based on the characteristics of Stride-DFA to get a low false positive rate at the additional cost of a small increase in memory consumption

Irish Universities

DCU Online Research Access Service

Parallelization of a software based intrusion detection system - Snort

Author: Zhang Huan
Publication venue: University of Canterbury. Electrical and Computer Engineering
Publication date: 01/01/2011
Field of study

Computer networks are already ubiquitous in people’s lives and work and network security is becoming a critical part. A simple firewall, which can only scan the bottom four OSI layers, cannot satisfy all security requirements. An intrusion detection system (IDS) with deep packet inspection, which can filter all seven OSI layers, is becoming necessary for more and more networks. However, the processing throughputs of the IDSs are far behind the current network speed. People have begun to improve the performance of the IDSs by implementing them on different hardware platforms, such as Field-Programmable Gate Array (FPGA) or some special network processors. Nevertheless, all of these options are either less flexible or more expensive to deploy. This research focuses on some possibilities of implementing a parallelized IDS on a general computer environment based on Snort, which is the most popular open-source IDS at the moment. In this thesis, some possible methods have been analyzed for the parallelization of the pattern-matching engine based on a multicore computer. However, owing to the small granularity of the network packets, the pattern-matching engine of Snort is unsuitable for parallelization. In addition, a pipelined structure of Snort has been implemented and analyzed. The universal packet capture API - LibPCAP has been modified for a new feature, which can capture a packet directly to an external buffer. Then, the performance of the pipelined Snort can have an improvement up to 60% on an Intel i7 multicore computer for jumbo frames. A primary limitation is on the memory bandwidth. With a higher bandwidth, the performance of the parallelization can be further improved

UC Research Repository

Parallelization of a software based intrusion detection system - Snort

Author: Klinge Michael
Lehmkuhl Frank
Stauch Georg
Publication venue: University of Canterbury. Electrical and Computer Engineering
Publication date: 01/01/2011
Field of study

UC Research Repository

Improved algorithms for string searching problems

Author: Salmela Leena
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

We present improved practically efficient algorithms for several string searching problems, where we search for a short string called the pattern in a longer string called the text. We are mainly interested in the online problem, where the text is not preprocessed, but we also present a light indexing approach to speed up exact searching of a single pattern. The new algorithms can be applied e.g. to many problems in bioinformatics and other content scanning and filtering problems. In addition to exact string matching, we develop algorithms for several other variations of the string matching problem. We study algorithms for approximate string matching, where a limited number of errors is allowed in the occurrences of the pattern, and parameterized string matching, where a substring of the text matches the pattern if the characters of the substring can be renamed in such a way that the renamed substring matches the pattern exactly. We also consider searching multiple patterns simultaneously and searching weighted patterns, where the weight of a character at a given position reflects the probability of that character occurring at that position. Many of the new algorithms use the backward matching principle, where the characters of the text that are aligned with the pattern are read backward, i.e. from right to left. Another common characteristic of the new algorithms is the use of q-grams, i.e. q consecutive characters are handled as a single character. Many of the new algorithms are bit parallel, i.e. they pack several variables to a single computer word and update all these variables with a single instruction. We show that the q-gram backward string matching algorithms that solve the exact, approximate, or multiple string matching problems are optimal on average. We also show that the q-gram backward string matching algorithm for the parameterized string matching problem is sublinear on average for a class of moderately repetitive patterns. All the presented algorithms are also shown to be fast in practice when compared to earlier algorithms. We also propose an alphabet sampling technique to speed up exact string matching. We choose a subset of the alphabet and select the corresponding subsequence of the text. String matching is then performed on this reduced subsequence and the found matches are verified in the original text. We show how to choose the sampled alphabet optimally and show that the technique speeds up string matching especially for moderate to long patterns

Aaltodoc Publication Archive

Implémentations logicielle et matérielle de l'algorithme Aho-Corasick pour la détection d'intrusions

Author: Lacroix Alexsandre Bonneau
Publication venue
Publication date: 01/12/2016
Field of study

RÉSUMÉ Ce travail propose des méthodes et architectures efficaces pour l’implémentation de l’algorithme Aho-Corasick. Cet algorithme peut être utilisé pour la recherche de chaînes de caractères dans un système de détection d’intrusion, tels que Snort, pour les réseaux informatiques. Deux versions sont proposées, une version logicielle et une version matérielle. La première version développe une implémentation logicielle pour des processeurs à usage général. Pour cela, de nouvelles implémentations de l'algorithme tenant compte des ressources mémoire et de l’exécution séquentielle des processeurs ont été proposées. La deuxième version développe de nouvelles architectures de processeurs particularisés pour FPGA. Elles tiennent compte des ressources de calcul disponibles, des ressources mémoire et du potentiel de parallélisation à grain fin offert par le FPGA. De plus, une comparaison avec une version logicielle modifiée est effectuée. Dans les deux cas, les performances et les compromis pour la sélection de différentes structures de données de nœuds en mémoire ont été analysés. Une sélection de paramètres est proposée afin de maximiser la fonction objective de performance qui combine le nombre de cycles, la consommation mémoire et la fréquence d’horloge du système. Les paramètres permettent de déterminer lequel des deux ou des trois types de structures de données de nœuds (selon la version) sera choisi pour chaque nœud d’une machine à états. Lors de la validation, des scénarios de test utilisant des données variées ont été utilisés afin de s'assurer du bon fonctionnement de l'algorithme. De plus, les contenus des règles de Snort 2.9.7 ont été utilisés. La machine à états a été construite avec environ 26×103 chaînes de caractères qui sont toutes extraites de ces règles. La machine à états contient environ 381×103 nœuds. La contribution générale de ce mémoire est de montrer qu’il est possible, à travers l’exploration d’architectures, de sélectionner des paramètres afin d’obtenir un produit mémoire × temps optimal. Pour ce qui est de la version logicielle, la consommation mémoire diminue de 407 Mo à 21 Mo, ce qui correspond à une diminution de mémoire d’environ 20× par rapport au pire cas avec seulement un type de nœud. Pour ce qui est de la version matérielle, la consommation mémoire diminue de 11 Mo à 4 Mo, ce qui résulte en une diminution de mémoire d’environ 3× par rapport à la version logicielle modifiée. Pour ce qui est du débit, il augmente de 300 Mbps pour la version logicielle modifiée à 400 Mbps pour la version matérielle.----------ABSTRACT This work proposes effective methods and architectures for the implementation of the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Two versions are proposed, a software version and a hardware version. The first version develops a software implementation in C/C++ for general purpose processors. For this, new implementations of the algorithm, considering the memory resources and the processor’s sequential execution, are proposed. The second version develops an architecture in VHDL for a specialized processor on FPGA. For this, new architectures of the algorithm, considering the available computing resources, the memory resources and the inherent parallelism of FPGAs, are proposed. Furthermore, a comparison with a modified software version is performed. For both cases, we analyze the performance and cost trade-off from selecting different data structures of nodes in memory. A selection of parameters is used in order to maximize de performance objective function that combines the cycles count, the memory usage and the system’s frequency. The parameters determine which of two or three types of data structures of nodes (depending on the version) is selected for each node of the state machine. For the validation phase, test cases with diverse data are used in order to ensure that the algorithm operates properly. Furthermore, the Snort 2.9.7 rules are used. The state machine was built with around 26×103 patterns which are all extracted from these rules. The state machine is comprised of around 381×103 nodes. The main contribution of this work is to show that it is possible to choose parameters through architecture exploration, to obtain an optimal memory × time product. For the software version, the memory consumption is reduced from 407 MB to 21 MB, which results in a memory improvement of about 20× compared with the single node-type case. For the hardware version, the memory consumption is reduced from 11 MB to 4 MB, which results in a memory improvement of about 3× compared with the modified software version. For the throughput, it increases from 300 Mbps with the modified software version to 400 Mbps with the hardware version

PolyPublie