4,319 research outputs found

    Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda

    Get PDF
    Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiant's algorithm. The generalisation can be used for recognition, parsing or generic calculation of the transitive closure of upper triangular matrices. The proof is certified by the Agda proof assistant. The certification is representative of state-of-the-art methods for specification and proofs in proof assistants based on type-theory. As such, this paper can be read as a tutorial for the Agda system

    TRX: A Formally Verified Parser Interpreter

    Full text link
    Parsing is an important problem in computer science and yet surprisingly little attention has been devoted to its formal verification. In this paper, we present TRX: a parser interpreter formally developed in the proof assistant Coq, capable of producing formally correct parsers. We are using parsing expression grammars (PEGs), a formalism essentially representing recursive descent parsing, which we consider an attractive alternative to context-free grammars (CFGs). From this formalization we can extract a parser for an arbitrary PEG grammar with the warranty of total correctness, i.e., the resulting parser is terminating and correct with respect to its grammar and the semantics of PEGs; both properties formally proven in Coq.Comment: 26 pages, LMC

    Left Recursion in Parsing Expression Grammars

    Full text link
    Parsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG libraries in several programming languages. A frequently missed feature of PEGs is left recursion, which is commonly used in Context-Free Grammars (CFGs) to encode left-associative operations. We present a simple conservative extension to the semantics of PEGs that gives useful meaning to direct and indirect left-recursive rules, and show that our extensions make it easy to express left-recursive idioms from CFGs in PEGs, with similar results. We prove the conservativeness of these extensions, and also prove that they work with any left-recursive PEG. PEGs can also be compiled to programs in a low-level parsing machine. We present an extension to the semantics of the operations of this parsing machine that let it interpret left-recursive PEGs, and prove that this extension is correct with regards to our semantics for left-recursive PEGs.Comment: Extended version of the paper "Left Recursion in Parsing Expression Grammars", that was published on 2012 Brazilian Symposium on Programming Language

    A language-theoretic view on network protocols

    Full text link
    Input validation is the first line of defense against malformed or malicious inputs. It is therefore critical that the validator (which is often part of the parser) is free of bugs. To build dependable input validators, we propose using parser generators for context-free languages. In the context of network protocols, various works have pointed at context-free languages as falling short to specify precisely or concisely common idioms found in protocols. We review those assessments and perform a rigorous, language-theoretic analysis of several common protocol idioms. We then demonstrate the practical value of our findings by developing a modular, robust, and efficient input validator for HTTP relying on context-free grammars and regular expressions

    Autonomy Operating System for UAVs: Pilot-in-a-Box

    Get PDF
    The Autonomy Operating System (AOS) is an open flight software platform with Artificial Intelligence for smart UAVs. It is built to be extendable with new apps, similar to smartphones, to enable an expanding set of missions and capabilities. AOS has as its foundations NASAs core flight executive and core flight software (cFEcFS). Pilot-in-a-Box (PIB) is an expanding collection of interacting AOS apps that provide the knowledge and intelligence onboard a UAV to safely and autonomously fly in the National Air Space, eventually without a remote human ground crew. Longer-term, the goal of PIB is to provide the capability for pilotless air vehicles such as air taxis that will be key for new transportation concepts such as mobility-on-demand. PIB provides the procedural knowledge, situational awareness, and anticipatory planning (thinking ahead of the plane) that comprises pilot competencies. These competencies together with a natural language interface will enable Pilot-in-a-Box to dialogue directly with Air Traffic Management from takeoff through landing. This paper describes the overall AOS architecture, Artificial Intelligence reasoning engines, Pilot-in-a-box competencies, and selected experimental flight tests to date

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Syntactic development in early foreign language learning: Effects of L1 transfer, input and individual factors

    Get PDF
    This study explores parallels and differences in the comprehension of wh-questions and relative clauses between early foreign-language (FL) learners and monolingual children. We test for (a) effects of syntactic first-language (L1) transfer, (b) the impact of input on syntactic development, and (c) the impact of individual differences on early FL syntactic development. We compare the results to findings in child second language (L2) naturalistic acquisition and adult FL acquisition. Following work on adult FL acquisition, we carried out a picture-based interpretation task with 243 child FL learners in fourth grade at different regular, partial, and high-immersion schools in Germany plus 68 monolingual English children aged 5 to 8 years as controls. The child FL learners display a strong subject-first preference but do not appear to use the L1 syntax in comprehension. Input differences across different schools affect overall accuracy, with students at high-immersion FL schools catching up to monolingual performance within 4 years of learning. Finally, phonological awareness is implicated in both early FL learning and naturalistic child L2 development. These findings suggest that early FL development resembles child L2 acquisition in speed and effects of individual factors, yet is different from adult FL acquisition due to the absence of L1 transfer effects.Peer reviewedFinal Accepted Versio

    On the formalization of some results of context-free language theory

    Get PDF
    This work describes a formalization effort, using the Coq proof assistant, of fundamental results related to the classical theory of context-free grammars and languages. These include closure properties (union, concatenation and Kleene star), grammar simplification (elimination of useless symbols, inaccessible symbols, empty rules and unit rules), the existence of a Chomsky Normal Form for context-free grammars and the Pumping Lemma for context-free languages. The result is an important set of libraries covering the main results of context-free language theory, with more than 500 lemmas and theorems fully proved and checked. This is probably the most comprehensive formalization of the classical context-free language theory in the Coq proof assistant done to the present date, and includes the important result that is the formalization of the Pumping Lemma for context-free languages.info:eu-repo/semantics/publishedVersio

    Certified derivative-based parsing of regular expressions.

    Get PDF
    Programa de P?s-Gradua??o em Ci?ncia da Computa??o. Departamento de Ci?ncia da Computa??o, Instituto de Ci?ncias Exatas e Biol?gicas, Universidade Federal de Ouro Preto.Parsing is pervasive in computing and fundamental in several software artifacts. This dissertation reports the rst step in our ultimate goal: a formally veri ed toolset for parsing regular and context free languages based on derivatives. Speci cally, we describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU Grep has been developed using the certi ed algorithms. Practical experiments conducted using this tool are reported
    • …
    corecore