1,313 research outputs found

    The Effect of Normalization on Intrusion Detection Classifiers (Na�ve Bayes and J48)

    Get PDF
    Intrusion Detection has become an inevitable area for commercial applications and academic research. Network traffic is typically very high volume and consists of both qualitative and quantitative data with different range of values. Raw data needs to be pre-processed before fed into any learning model and the most used technique is normalization [1]. Attribute normalization eliminates the dominance of attributes with extreme values by scaling it within the range. However, many intrusion detection methods do not normalize attributes before training and detection [2]. Network traffic data contains features that are qualitative or quantitative nature and has to be treated differently [3]. This work studies the effect of normalization on Naive Bayes and J48 Decision tree classifier with the corrected KDDCUP99 and Kyoto 2006+ dataset. A comprehensive approach for normalization for network traffic attributes has been proposed

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    Modular quantum signal processing in many variables

    Full text link
    Despite significant advances in quantum algorithms, quantum programs in practice are often expressed at the circuit level, forgoing helpful structural abstractions common to their classical counterparts. Consequently, as many quantum algorithms have been unified with the advent of quantum signal processing (QSP) and quantum singular value transformation (QSVT), an opportunity has appeared to cast these algorithms as modules that can be combined to constitute complex programs. Complicating this, however, is that while QSP/QSVT are often described by the polynomial transforms they apply to the singular values of large linear operators, and the algebraic manipulation of polynomials is simple, the QSP/QSVT protocols realizing analogous manipulations of their embedded polynomials are non-obvious. Here we provide a theory of modular multi-input-output QSP-based superoperators, the basic unit of which we call a gadget, and show they can be snapped together with LEGO-like ease at the level of the functions they apply. To demonstrate this ease, we also provide a Python package for assembling gadgets and compiling them to circuits. Viewed alternately, gadgets both enable the efficient block encoding of large families of useful multivariable functions, and substantiate a functional-programming approach to quantum algorithm design in recasting QSP and QSVT as monadic types.Comment: 15 pages + 9 figures + 4 tables + 45 pages supplement. For codebase, see https://github.com/ichuang/pyqsp/tree/bet

    Security Applications of Formal Language Theory

    Get PDF
    We present an approach to improving the security of complex, composed systems based on formal language theory, and show how this approach leads to advances in input validation, security modeling, attack surface reduction, and ultimately, software design and programming methodology. We cite examples based on real-world security flaws in common protocols representing different classes of protocol complexity. We also introduce a formalization of an exploit development technique, the parse tree differential attack, made possible by our conception of the role of formal grammars in security. These insights make possible future advances in software auditing techniques applicable to static and dynamic binary analysis, fuzzing, and general reverse-engineering and exploit development. Our work provides a foundation for verifying critical implementation components with considerably less burden to developers than is offered by the current state of the art. It additionally offers a rich basis for further exploration in the areas of offensive analysis and, conversely, automated defense tools and techniques. This report is divided into two parts. In Part I we address the formalisms and their applications; in Part II we discuss the general implications and recommendations for protocol and software design that follow from our formal analysis

    Description of the Core of the Spelling Control Software Package in the Sphinx Framework Environment

    Get PDF
    The task of constructing a software complex for the control and correction of spelling errors in texts in natural languages in the environment of the framework "SPHINX" was set. Mechanisms of structured n-grams of grammar models, identification mechanisms are developed, and descriptions of the main core of the functioning of the complex based on the XML format are given. Implemented methods for describing the main components of the n-gram grammar: grammar import, description of the lexicon, random numbers of n-grams, delay weights

    Acta Cybernetica : Volume 17. Number 2.

    Get PDF

    A language-theoretic view on network protocols

    Full text link
    Input validation is the first line of defense against malformed or malicious inputs. It is therefore critical that the validator (which is often part of the parser) is free of bugs. To build dependable input validators, we propose using parser generators for context-free languages. In the context of network protocols, various works have pointed at context-free languages as falling short to specify precisely or concisely common idioms found in protocols. We review those assessments and perform a rigorous, language-theoretic analysis of several common protocol idioms. We then demonstrate the practical value of our findings by developing a modular, robust, and efficient input validator for HTTP relying on context-free grammars and regular expressions

    Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

    Full text link
    The discovery of neural architectures from simple building blocks is a long-standing goal of Neural Architecture Search (NAS). Hierarchical search spaces are a promising step towards this goal but lack a unifying search space design framework and typically only search over some limited aspect of architectures. In this work, we introduce a unifying search space design framework based on context-free grammars that can naturally and compactly generate expressive hierarchical search spaces that are 100s of orders of magnitude larger than common spaces from the literature. By enhancing and using their properties, we effectively enable search over the complete architecture and can foster regularity. Further, we propose an efficient hierarchical kernel design for a Bayesian Optimization search strategy to efficiently search over such huge spaces. We demonstrate the versatility of our search space design framework and show that our search strategy can be superior to existing NAS approaches. Code is available at https://github.com/automl/hierarchical_nas_construction
    • …
    corecore