16,320 research outputs found

    Memory-Efficient Regular Expression Search Using State Merging

    Full text link
    Abstract — Pattern matching is a crucial task in several critical network services such as intrusion detection and policy man-agement. As the complexity of rule-sets increases, traditional string matching engines are being replaced by more sophisticated regular expression engines. To keep up with line rates, deal with denial of service attacks and provide predictable resource provisioning, the design of such engines must allow examining payload traffic at several gigabits per second and provide worst case speed guarantees. While regular expression matching using deterministic finite automata (DFA) is a well studied problem in theory, its implementation either in software or specialized hardware is complicated by prohibitive memory requirements. This is especially true for DFAs representing complex regular expressions present in practical rule-sets. In this paper, we introduce a novel method to drastically reduce the DFA memory requirement and still provide worst-case speed guarantees. Specifically, we merge several “non-equivalent” states in a DFA by introducing labels on their input and output transitions. We then propose a data structure to represent the merged states and the transition labels. We show that, with very few assumptions about the original DFA, such a transformation results in significant compression in the DFA representation. We have implemented a state merging and transition labeling algorithm for DFAs, and show that for Snort and Bro security rule-sets, state merging results in memory reductions of an order of magnitude. I

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    Logical Algorithms meets CHR: A meta-complexity result for Constraint Handling Rules with rule priorities

    Full text link
    This paper investigates the relationship between the Logical Algorithms language (LA) of Ganzinger and McAllester and Constraint Handling Rules (CHR). We present a translation schema from LA to CHR-rp: CHR with rule priorities, and show that the meta-complexity theorem for LA can be applied to a subset of CHR-rp via inverse translation. Inspired by the high-level implementation proposal for Logical Algorithm by Ganzinger and McAllester and based on a new scheduling algorithm, we propose an alternative implementation for CHR-rp that gives strong complexity guarantees and results in a new and accurate meta-complexity theorem for CHR-rp. It is furthermore shown that the translation from Logical Algorithms to CHR-rp combined with the new CHR-rp implementation, satisfies the required complexity for the Logical Algorithms meta-complexity result to hold.Comment: To appear in Theory and Practice of Logic Programming (TPLP

    Pushdown Control-Flow Analysis of Higher-Order Programs

    Full text link
    Context-free approaches to static analysis gain precision over classical approaches by perfectly matching returns to call sites---a property that eliminates spurious interprocedural paths. Vardoulakis and Shivers's recent formulation of CFA2 showed that it is possible (if expensive) to apply context-free methods to higher-order languages and gain the same boost in precision achieved over first-order programs. To this young body of work on context-free analysis of higher-order programs, we contribute a pushdown control-flow analysis framework, which we derive as an abstract interpretation of a CESK machine with an unbounded stack. One instantiation of this framework marks the first polyvariant pushdown analysis of higher-order programs; another marks the first polynomial-time analysis. In the end, we arrive at a framework for control-flow analysis that can efficiently compute pushdown generalizations of classical control-flow analyses.Comment: The 2010 Workshop on Scheme and Functional Programmin

    Efficient Multistriding of Large Non-deterministic Finite State Automata for Deep Packet Inspection

    Get PDF
    Multistride automata speed up input matching because each multistriding transformation halves the size of the input string, leading to a potential 2x speedup. However, up to now little effort has been spent in optimizing the building process of multistride automata, with the result that current algorithms cannot be applied to real-life, large automata such as the ones used in commercial IDSs, because the time and the memory space needed to create the new automaton quickly becomes unfeasible. In this paper, new algorithms for efficient building of multistride NFAs for packet inspection are presented, explaining how these new techniques can outperform the previous algorithms in terms of required time and memory usag

    From Finite Automata to Regular Expressions and Back--A Summary on Descriptional Complexity

    Full text link
    The equivalence of finite automata and regular expressions dates back to the seminal paper of Kleene on events in nerve nets and finite automata from 1956. In the present paper we tour a fragment of the literature and summarize results on upper and lower bounds on the conversion of finite automata to regular expressions and vice versa. We also briefly recall the known bounds for the removal of spontaneous transitions (epsilon-transitions) on non-epsilon-free nondeterministic devices. Moreover, we report on recent results on the average case descriptional complexity bounds for the conversion of regular expressions to finite automata and brand new developments on the state elimination algorithm that converts finite automata to regular expressions.Comment: In Proceedings AFL 2014, arXiv:1405.527

    Matrix Code

    Full text link
    Matrix Code gives imperative programming a mathematical semantics and heuristic power comparable in quality to functional and logic programming. A program in Matrix Code is developed incrementally from a specification in pre/post-condition form. The computations of a code matrix are characterized by powers of the matrix when it is interpreted as a transformation in a space of vectors of logical conditions. Correctness of a code matrix is expressed in terms of a fixpoint of the transformation. The abstract machine for Matrix Code is the dual-state machine, which we present as a variant of the classical finite-state machine.Comment: 39 pages, 19 figures; extensions and minor correction
    corecore