30 research outputs found

    Symbolic Lookaheads for Bottom-up Parsing

    Get PDF
    We present algorithms for the construction of LALR(1) parsing tables, and of LR(1) parsing tables of reduced size. We first define specialized characteristic automata whose states are parametric w.r.t. variables symbolically representing lookahead-sets. The propagation flow of lookaheads is kept in the form of a system of recursive equations, which is resolved to obtain the concrete LALR(1) table. By inspection of the LALR(1) automaton and of its lookahead ropagation flow, we decide whether the grammar is LR(1) or not. In the positive case, an LR(1) parsing table of reduced size is computed by refinement of the LALR(1) table

    Practical Dynamic Symbolic Execution for JavaScript

    Get PDF

    音声翻訳における文解析技法について

    Get PDF
    本文データは平成22年度国立国会図書館の学位論文(博士)のデジタル化実施により作成された画像ファイルを基にpdf変換したものである京都大学0048新制・論文博士博士(工学)乙第8652号論工博第2893号新制||工||968(附属図書館)UT51-94-R411(主査)教授 長尾 真, 教授 堂下 修司, 教授 池田 克夫学位規則第4条第2項該当Doctor of EngineeringKyoto UniversityDFA

    Efficient algorithms for hard problems in nondeterministic tree automata

    Get PDF
    We present PTIME language-preserving techniques for the reduction of non-deterministic tree automata, both for the case of finite trees and for infinite trees. Our techniques are based on new transition removing and state merging results, which rely on binary relations that compare the downward and upward behaviours of states in the automaton. We use downward/upward simulation preorders and the more general but EXPTIME-complete trace inclusion relations, for which we introduce good under-approximations computable in polynomial time. We provide a complete picture of combinations of downward and upward simulation/trace inclusions which can be used in our reduction techniques. We define an algorithm that puts together all the reduction results found for finite trees, and implemented it under the name minotaut, a tool built on top of the well-known tree automata library libvata. We tested minotaut on large collections of automata from program verification provenience, as well as on different classes of randomly generated automata. Our algorithm yields substantially smaller and sparser automata than all previously known reduction techniques, and it is still fast enough to handle large instances. Taking reduction of automata on finite trees one step further, we then introduce saturation, a technique that consists of adding new transitions to an automaton while preserving its language. We implemented this technique on minotaut and we show how it can make subsequent state-merge and transition-removal operations more effective. Thus we obtain a PTIME algorithm that reduces the number of states of tree automata even more than before. Additionally, we explore how minotaut alone can play an important role when performing hard operations like complementation, allowing to obtain smaller complement automata and at lower computation times overall. We then show how saturation can extend this contribution even further. An overview of the tool, highlighting some of its implementation features, is presented as well

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    Verified Propagation Redundancy and Compositional UNSAT Checking in CakeML

    Get PDF
    Modern SAT solvers can emit independently-checkable proof certificates to validate their results. The state-of-the-art proof system that allows for compact proof certificates is propagation redundancy (PR). However, the only existing method to validate proofs in this system with a formally verified tool requires a transformation to a weaker proof system, which can result in a significant blowup in the size of the proof and increased proof validation time. This article describes the first approach to formally verify PR proofs on a succinct representation. We present (i) a new Linear PR (LPR) proof format, (ii) an extension of the DPR-trim tool to efficiently convert PR proofs into LPR format, and (iii) cake_lpr, a verified LPR proof checker developed in CakeML. We also enhance these tools with (iv) a new compositional proof format designed to enable separate (parallel) proof checking. The LPR format is backwards compatible with the existing LRAT format, but extends LRAT with support for the addition of PR clauses. Moreover, cake_lpr is verified using CakeML ’s binary code extraction toolchain, which yields correctness guarantees for its machine code (binary) implementation. This further distinguishes our clausal proof checker from existing checkers because unverified extraction and compilation tools are removed from its trusted computing base. We experimentally show that: LPR provides efficiency gains over existing proof formats; cake_lpr ’s strong correctness guarantees are obtained without significant sacrifice in its performance; and the compositional proof format enables scalable parallel proof checking for large proofs
    corecore