Search CORE

3 research outputs found

Captura de patrones en archivos de logs mediante el uso de expresiones regulares en GPUs

Author: Pinto Silva Antonio Luis
Publication venue
Publication date: 01/01/2016
Field of study

The information contained in a system is normally stored into log files. Most of the time, these files store the information in plain text with many not formatted information. It is then necessary to extract parts of this information to be able to understand what is going on such system. Currently, such information can be extracted using programs that make use of extended regular expressions. The use of regular expressions allows the search of patterns but it can be also used to extract data from the searched pattern. Most of the programs that implement regular expressions are based on finite automatas, such as non-deterministic (NFA) or deterministic (DFA). We aim to explore the use of finite automatas to extract data from log files using a Graphic Processor Unit (GPU) device to speedup the process. Moreover, we will also explore data parallelism over the lines present on the log file. Currently, the work done in GPU with regular expressions is limited to matching tasks only, without any capture feature. We present a solution that solves this lack of pattern capture in current implementations. Our development uses as base the implementation of TNFA and converts it to a TDFA before running the GPU task. We explore the new CUDA feature named unified memory, supported since CUDA 6, together with streams to achieve the best possible performance in our GPU implementation. Using real log files and regular expressions made to extract specific data, our evaluation shows that it can be up to 9 faster than the sequential implementation.La información contenida en un sistema normalmente se almacena en archivos de registros, conocidos comúnmente como logs. La mayor parte de las veces, estos archivos almacenan la información en texto plano, con mucha información sin formatear. Por ello es necesario extraer partes de esta información, de forma que se pueda saber qué está ocurriendo en dicho sistema. Actualmente, esta información se puede extraer usando programas que aprovechan las expresiones regulares extendidas. Su uso permite la búsqueda de patrones, pero también se pueden emplear para extraer datos del patrón buscado. La mayoría de los programas que implementan expresiones regulares se basan en autómatas finitos, tales como los no deterministas (NFA) y los deterministas (DFA). El objetivo de este Trabajo Fin de Máster es explorar el uso de autómatas finitos para extraer datos de archivos de log usando una GPU para acelerar el proceso. Es más, también exploramos el paralelismo que se puede aplicar sobre las líneas de un archivo de log. En la actualidad, el trabajo realizado con GPUs y expresiones regulares se limita a tareas de búsqueda de patrones, sin ninguna funcionalidad de captura. Presentamos una solución que resuelve esta falta de funcionalidad en las implementaciones actuales. Nuestro desarrollo usa como base una implementación de TNFA y la convierte a TDFA antes de ejecutar la tarea en la GPU. Exploramos la nueva funcionalidad de CUDA denominada memoria unificada, que se soporta desde la versión 6 de CUDA, así como el uso de flujos o streams para alcanzar el mejor rendimiento posible en nuestra implementación en GPU. Al usar archivos de log reales y expresiones regulares para extraer datos específicos, nuestra evaluación muestra que la implementación paralela es hasta 9 veces más rápida que la implementación secuencial

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Stream Processing using Grammars and Regular Expressions

Author: Rasmussen Ulrik Terp
Publication venue
Publication date: 01/01/2016
Field of study

In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

arXiv.org e-Print Archive

Copenhagen University Research Information System

Fast submatch extraction using obdds

Author: Liu Yang
Prasad Rao
Pratyusa Manadhata
Vinod Ganapathy
William Horne
Publication venue: ACM
Publication date: 01/01/2012
Field of study

Network-based intrusion detection systems (NIDS) commonly use pattern languages to identify packets of interest. Similarly, security information and event management (SIEM)systemsrely onpatternlanguages for real-time analysis of security alerts and event logs. Both NIDS and SIEM systems usepatternlanguages extendedfrom regular expressions. One such extension, the submatch construct, allows the extraction of substrings from a string matching a pattern. Existing solutions for submatch extraction are based on non-deterministic finite automata (NFAs) or recursive backtracking. NFA-based algorithms are time-inefficient. Recursive backtrackingalgorithms perform poorly on pathological inputs generated by algorithmic complexity attacks. We propose a new approach for submatch extraction that uses ordered binary decision diagrams (OBDDs) to represent and operate pattern matching. Our evaluation using patterns from the Snort HTTP rule set and a commercial SIEM system shows that our approach achieves its ideal performance when patterns are combined. In the best case, our approach is faster than RE2 and PCRE by one to two orders of magnitude

CiteSeerX

Crossref