2,253 research outputs found

    Message-Passing Protocols for Real-World Parsing -- An Object-Oriented Model and its Preliminary Evaluation

    Full text link
    We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints imposed by real-world NLP. Our approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message-passing protocols for parsing and provide reasons for sacrificing completeness of the parse in favor of efficiency based on a preliminary empirical evaluation.Comment: 12 pages, uses epsfig.st

    Concurrent Lexicalized Dependency Parsing: The ParseTalk Model

    Full text link
    A grammar model for concurrent, object-oriented natural language parsing is introduced. Complete lexical distribution of grammatical knowledge is achieved building upon the head-oriented notions of valency and dependency, while inheritance mechanisms are used to capture lexical generalizations. The underlying concurrent computation model relies upon the actor paradigm. We consider message passing protocols for establishing dependency relations and ambiguity handling.Comment: 90kB, 7pages Postscrip

    Modeling and Testing Implementations of Protocols with Complex Messages

    Get PDF
    This paper presents a new language called APSL for formally describing protocols to facilitate automated testing. Many real world communication protocols exchange messages whose structures are not trivial, e.g. they may consist of multiple and nested fields, some could be optional, and some may have values that depend on other fields. To properly test implementations of such a protocol, it is not sufficient to only explore different orders of sending and receiving messages. We also need to investigate if the implementation indeed produces correctly formatted messages, and if it responds correctly when it receives different variations of every message type. APSL's main contribution is its sublanguage that is expressive enough to describe complex message formats, both text-based and binary. As an example, this paper also presents a case study where APSL is used to model and test a subset of Courier IMAP email server

    Pattern matching in compilers

    Get PDF
    In this thesis we develop tools for effective and flexible pattern matching. We introduce a new pattern matching system called amethyst. Amethyst is not only a generator of parsers of programming languages, but can also serve as an alternative to tools for matching regular expressions. Our framework also produces dynamic parsers. Its intended use is in the context of IDE (accurate syntax highlighting and error detection on the fly). Amethyst offers pattern matching of general data structures. This makes it a useful tool for implementing compiler optimizations such as constant folding, instruction scheduling, and dataflow analysis in general. The parsers produced are essentially top-down parsers. Linear time complexity is obtained by introducing the novel notion of structured grammars and regularized regular expressions. Amethyst uses techniques known from compiler optimizations to produce effective parsers.Comment: master thesi

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    IEEE Standard 1500 Compliance Verification for Embedded Cores

    Get PDF
    Core-based design and reuse are the two key elements for an efficient system-on-chip (SoC) development. Unfortunately, they also introduce new challenges in SoC testing, such as core test reuse and the need of a common test infrastructure working with cores originating from different vendors. The IEEE 1500 Standard for Embedded Core Testing addresses these issues by proposing a flexible hardware test wrapper architecture for embedded cores, together with a core test language (CTL) used to describe the implemented wrapper functionalities. Several intellectual property providers have already announced IEEE Standard 1500 compliance in both existing and future design blocks. In this paper, we address the problem of guaranteeing the compliance of a wrapper architecture and its CTL description to the IEEE Standard 1500. This step is mandatory to fully trust the wrapper functionalities in applying the test sequences to the core. We present a systematic methodology to build a verification framework for IEEE Standard 1500 compliant cores, allowing core providers and/or integrators to verify the compliance of their products (sold or purchased) to the standar

    Implementing PRISMA/DB in an OOPL

    Get PDF
    PRISMA/DB is implemented in a parallel object-oriented language to gain insight in the usage of parallelism. This environment allows us to experiment with parallelism by simply changing the allocation of objects to the processors of the PRISMA machine. These objects are obtained by a strictly modular design of PRISMA/DB. Communication between the objects is required to cooperatively handle the various tasks, but it limits the potential for parallelism. From this approach, we hope to gain a better understanding of parallelism, which can be used to enhance the performance of PRISMA/DB.\ud The work reported in this document was conducted as part of the PRISMA project, a joint effort with Philips Research Eindhoven, partially supported by the Dutch "Stimuleringsprojectteam Informaticaonderzoek (SPIN)

    Foundational Verification of Stateful P4 Packet Processing

    Get PDF
    P4 is a standardized programming language for the network data plane. But P4 is not just for routing anymore. As programmable switches support stateful objects, P4 programs move beyond just stateless forwarders into new stateful applications: network telemetry (heavy hitters, DDoS detection, performance monitoring), middleboxes (firewalls, NAT, load balancers, intrusion detection), and distributed services (in-network caching, lock management, conflict detection). The complexity of stateful programs and their richer specifications are beyond what existing P4 program verifiers can handle. Verifiable P4 is a new interactive verification framework for P4 that (1) allows reasoning about multi-packet properties by specifying the per-packet relation between initial and final states; (2) performs modular verification, especially providing a modular description for stateful objects; (3) is foundational, i.e., with a machine-checked soundness proof with respect to a formal operational semantics of P4_{16} (the current specification of P4) in Coq. In addition, our framework includes a proved-correct reference interpreter. We demonstrate the framework with the specification and verification of a stateful firewall that uses a sliding-window Bloom filter on a Tofino switch to block (most) unsolicited traffic

    Concrete Syntax with Black Box Parsers

    Get PDF
    Context: Meta programming consists for a large part of matching, analyzing, and transforming syntax trees. Many meta programming systems process abstract syntax trees, but this requires intimate knowledge of the structure of the data type describing the abstract syntax. As a result, meta programming is error-prone, and meta programs are not resilient to evolution of the structure of such ASTs, requiring invasive, fault-prone change to these programs. Inquiry: Concrete syntax patterns alleviate this problem by allowing the meta programmer to match and create syntax trees using the actual syntax of the object language. Systems supporting concrete syntax patterns, however, require a concrete grammar of the object language in their own formalism. Creating such grammars is a costly and error-prone process, especially for realistic languages such as Java and C++. Approach: In this paper we present Concretely, a technique to extend meta programming systems with pluggable concrete syntax patterns, based on external, black box parsers. We illustrate Concretely in the context of Rascal, an open-source meta programming system and language workbench, and show how to reuse existing parsers for Java, JavaScript, and C++. Furthermore, we propose Tympanic, a DSL to declaratively map external AST structures to Rascal's internal data structures. Tympanic allows implementors of Concretely to solve the impedance mismatch between object-oriented class hierarchies in Java and Rascal's algebraic data types. Both the algebraic data type and AST marshalling code is automatically generated. Knowledge: The conceptual architecture of Concretely and Tympanic supports the reuse of pre-existing, external parsers, and their AST representation in meta programming systems that feature concrete syntax patterns for matching and constructing syntax trees. As such this opens up concrete syntax pattern matching for a host of realistic languages for which writing a grammar from scratch is time consuming and error-prone, but for which industry-strength parsers exist in the wild. Grounding: We evaluate Concretely in terms of source lines of code (SLOC), relative to the size of the AST data type and marshalling code. We show that for real programming languages such as C++ and Java, adding support for concrete syntax patterns takes an effort only in the order of dozens of SLOC. Similarly, we evaluate Tympanic in terms of SLOC, showing an order of magnitude of reduction in SLOC compared to manual implementation of the AST data types and marshalling code. Importance: Meta programming has applications in reverse engineering, reengineering, source code analysis, static analysis, software renovation, domain-specific language engineering, and many others. Processing of syntax trees is central to all of these tasks. Concrete syntax patterns improve the practice of constructing meta programs. The combination of Concretely and Tympanic has the potential to make concrete syntax patterns available with very little effort, thereby improving and promoting the application of meta programming in the general software engineering context
    • 

    corecore