21 research outputs found

    Thread divergence free and space efficient GPU implementation of NFA AC

    Get PDF
    Multipattern String Matching problem reports all occurrences of a given set or dictionary of patterns in a document. Multipattern string matching problems are used in databases, data mining, DNA and protein sequence analysis, Intrusion detection systems (IDS) for applications (APIDS), networks (NIDS), protocols (PIDS), Host-based IDS, antivirus softwares, and machine learning problems. Parallel algorithm for multipattern string matching can be useful for above mentioned application because by using parallel platforms large number of threads can be executed parallelly and these thread can search for patterns in parallel. One of the multipattern search algorithm is a Aho- Corasick (AC) is a multipattern search algorithm. AC algorithm has two versions : NFA AC and DFA AC. DFA AC and NFA AC have matching automata to perform multipattern searching. NFA AC automata takes less memory then DFA AC automata. Many parallel implementations for AC algorithm are available

    Using multiple GPUs to accelerate string searching for digital forensic analysis

    Get PDF
    String searching within a large corpus of data is an important component of digital forensic (DF) analysis techniques such as file carving. The continuing increase in capacity of consumer storage devices requires corresponding im-provements to the performance of string searching techniques. As string search-ing is a trivially-parallelisable problem, GPGPU approaches are a natural fit – but previous studies have found that local storage presents an insurmountable performance bottleneck. We show that this need not be the case with modern hardware, and demonstrate substantial performance improvements from the use of single and multiple GPUs when searching for strings within a typical forensic disk image

    A New Multi-threaded and Interleaving Approach to Enhance String Matching for Intrusion Detection Systems

    Get PDF
    String matching algorithms are computationally intensive operations in computer science. The algorithms find the occurrences of one or more strings patterns in a larger string or text. String matching algorithms are important for network security, biomedical applications, Web search, and social networks. Nowadays, the high network speeds and large storage capacity put a high requirement on string matching methods to perform the task in a short time. Traditionally, Aho-Corasick algorithm, which is used to find the string matches, is executed sequentially. In this paper, a new multi-threaded and interleaving approach of Aho-Corasick using graphics processing units (GPUs) is designed and implemented to achieve high-speed string matching. Compute Unified Device Architecture (CUDA) programming language is used to implement the proposed parallel version. Experimental results show that our approach achieves more than 5X speedup over the sequential and other parallel implementations. Hence, a wide range of applications can benefit from our solution to perform string matching faster than ever before

    Parallelizing a network intrusion detection system using a GPU.

    Get PDF
    As network speeds continue to increase and attacks get increasingly more complicated, there is need to improved detection algorithms and improved performance of Network Intrusion Detection Systems (NIDS). Recently, several attempts have been made to use the underutilized parallel processing capabilities of GPUs, to offload the costly NIDS pattern matching algorithms. This thesis presents an interface for NIDS Snort that allows porting of the pattern-matching algorithm to run on a GPU. The analysis show that this system can achieve up to four times speedup over the existing Snort implementation and that GPUs can be effectively utilized to perform intensive computational processes like pattern matching

    Network Traffic Anomaly-Detection Framework Using GPUs

    Get PDF
    Network security has been very crucial for the software industry. Deep packet inspection (DPI) is one of the widely used approaches in enforcing network security. Due to the high volume of network traffic, it is challenging to achieve high performance for DPI in real time. In this thesis, a new DPI framework is presented that accelerates packet header checking and payload inspection on graphics processing units (GPUs). Various optimizations were applied to GPU-version packet inspection, such as thread-level and block-level packet assignment, warp divergence elimination, and memory transfer optimization using pinned memory and shared memory. The performance of the pattern-matching algorithms used for DPI was analyzed by using an assorted set of characteristics such as pipeline stalls, shared memory efficiency, warp efficiency, issue slot utilization, and cache hits. The extensive characterization of the algorithms on the GPU architecture and the performance comparison among parallel pattern-matching algorithms on both the GPU and the CPU are the unique contributions of this thesis. Among the GPU-version algorithms, the Aho-Corasick algorithm and the Wu-Manber algorithm outperformed the Rabin-Karp algorithm because the Aho-Corasick and the Wu-Manber algorithms were executed only once for multiple signatures by using the tables generated before the searching phase was begun. According to my evaluation on a NVIDIA K80 GPU, the GPU-accelerated packet processing achieved at least 60 times better performance than CPU-version processing

    Real-Time Streaming Multi-Pattern Search for Constant Alphabet

    Get PDF
    In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D={P_1,P_2, . . . ,P_d} of d patterns (strings over an alphabet Sigma), called the dictionary, is given to be preprocessed. Then, a text T arrives one character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d log m) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities: * O(log log |Sigma|) time per character in the worst case and O(d log m) words of space. * O(1/epsilon) time per character in the worst case and O(d |Sigma|^epsilon log m/epsilon) words of space for any 0<epsilon<= 1. These results improve upon the algorithm of [Clifford et al., ESA\u2715] which uses O(d log m) words of space and takes O(log log (m+d)) time per character

    Captura de patrones en archivos de logs mediante el uso de expresiones regulares en GPUs

    Full text link
    The information contained in a system is normally stored into log files. Most of the time, these files store the information in plain text with many not formatted information. It is then necessary to extract parts of this information to be able to understand what is going on such system. Currently, such information can be extracted using programs that make use of extended regular expressions. The use of regular expressions allows the search of patterns but it can be also used to extract data from the searched pattern. Most of the programs that implement regular expressions are based on finite automatas, such as non-deterministic (NFA) or deterministic (DFA). We aim to explore the use of finite automatas to extract data from log files using a Graphic Processor Unit (GPU) device to speedup the process. Moreover, we will also explore data parallelism over the lines present on the log file. Currently, the work done in GPU with regular expressions is limited to matching tasks only, without any capture feature. We present a solution that solves this lack of pattern capture in current implementations. Our development uses as base the implementation of TNFA and converts it to a TDFA before running the GPU task. We explore the new CUDA feature named unified memory, supported since CUDA 6, together with streams to achieve the best possible performance in our GPU implementation. Using real log files and regular expressions made to extract specific data, our evaluation shows that it can be up to 9 faster than the sequential implementation.La información contenida en un sistema normalmente se almacena en archivos de registros, conocidos comúnmente como logs. La mayor parte de las veces, estos archivos almacenan la información en texto plano, con mucha información sin formatear. Por ello es necesario extraer partes de esta información, de forma que se pueda saber qué está ocurriendo en dicho sistema. Actualmente, esta información se puede extraer usando programas que aprovechan las expresiones regulares extendidas. Su uso permite la búsqueda de patrones, pero también se pueden emplear para extraer datos del patrón buscado. La mayoría de los programas que implementan expresiones regulares se basan en autómatas finitos, tales como los no deterministas (NFA) y los deterministas (DFA). El objetivo de este Trabajo Fin de Máster es explorar el uso de autómatas finitos para extraer datos de archivos de log usando una GPU para acelerar el proceso. Es más, también exploramos el paralelismo que se puede aplicar sobre las líneas de un archivo de log. En la actualidad, el trabajo realizado con GPUs y expresiones regulares se limita a tareas de búsqueda de patrones, sin ninguna funcionalidad de captura. Presentamos una solución que resuelve esta falta de funcionalidad en las implementaciones actuales. Nuestro desarrollo usa como base una implementación de TNFA y la convierte a TDFA antes de ejecutar la tarea en la GPU. Exploramos la nueva funcionalidad de CUDA denominada memoria unificada, que se soporta desde la versión 6 de CUDA, así como el uso de flujos o streams para alcanzar el mejor rendimiento posible en nuestra implementación en GPU. Al usar archivos de log reales y expresiones regulares para extraer datos específicos, nuestra evaluación muestra que la implementación paralela es hasta 9 veces más rápida que la implementación secuencial
    corecore