7,032 research outputs found

    Pattern matching in compilers

    Get PDF
    In this thesis we develop tools for effective and flexible pattern matching. We introduce a new pattern matching system called amethyst. Amethyst is not only a generator of parsers of programming languages, but can also serve as an alternative to tools for matching regular expressions. Our framework also produces dynamic parsers. Its intended use is in the context of IDE (accurate syntax highlighting and error detection on the fly). Amethyst offers pattern matching of general data structures. This makes it a useful tool for implementing compiler optimizations such as constant folding, instruction scheduling, and dataflow analysis in general. The parsers produced are essentially top-down parsers. Linear time complexity is obtained by introducing the novel notion of structured grammars and regularized regular expressions. Amethyst uses techniques known from compiler optimizations to produce effective parsers.Comment: master thesi

    Generalizing input-driven languages: theoretical and practical benefits

    Get PDF
    Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

    An Alternative Conception of Tree-Adjoining Derivation

    Get PDF
    The precise formulation of derivation for tree-adjoining grammars has important ramifications for a wide variety of uses of the formalism, from syntactic analysis to semantic interpretation and statistical language modeling. We argue that the definition of tree-adjoining derivation must be reformulated in order to manifest the proper linguistic dependencies in derivations. The particular proposal is both precisely characterizable through a definition of TAG derivations as equivalence classes of ordered derivation trees, and computationally operational, by virtue of a compilation to linear indexed grammars together with an efficient algorithm for recognition and parsing according to the compiled grammar.Comment: 33 page

    Engineering Object-Oriented Semantics Using Graph Transformations

    Get PDF
    In this paper we describe the application of the theory of graph transformations to the practise of language design. We have defined the semantics of a small but realistic object-oriented language (called TAAL) by mapping the language constructs to graphs and their operational semantics to graph transformation rules. In the process we establish a mapping between UML models and graphs. TAAL was developed for the purpose of this paper, as an extensive case study in engineering object-oriented language semantics using graph transformation. It incorporates the basic aspects of many commonly used object-oriented programming languages: apart from essential imperative programming constructs, it includes inheritance, object creation and method overriding. The language specification is based on a number of meta-models written in UML. Both the static and dynamic semantics are defined using graph rewriting rules. In the course of the case study, we have built an Eclipse plug-in that automatically transforms arbitrary TAAL programs into graphs, in a graph format readable by another tool. This second tool is called Groove, and it is able to execute graph transformations. By combining both tools we are able to visually simulate the execution of any TAAL program

    The Evaluation of Content-Based Web Queries

    Get PDF
    We introduce the notions of syntactically and semantically structured data to refine the notion of semi-structured data. As we will see, most data found on the Web is syntactically structured. In order to evaluate content-based Web queries, semantically structured data is needed. The problem occurs to transform syntactically structured data into semantically structured data. Syntactically and semantically structured data can be represented by trees. Our main contribution is a powerful restructuring mechanism that allows to express the transformation of trees representing syntactically structured data to trees that represent semantically structured data. We embed our restructuring mechanism into RAW (Relational Algebra for the Web) and demonstrate its expressiveness by several example queries

    Restricting the Weak-Generative Capacity of Synchronous Tree-Adjoining Grammars

    Get PDF
    The formalism of synchronous tree-adjoining grammars, a variant of standard tree-adjoining grammars (TAG), was intended to allow the use of TAGs for language transduction in addition to language specification. In previous work, the definition of the transduction relation defined by a synchronous TAG was given by appeal to an iterative rewriting process. The rewriting definition of derivation is problematic in that it greatly extends the expressivity of the formalism and makes the design of parsing algorithms difficult if not impossible. We introduce a simple, natural definition of synchronous tree-adjoining derivation, based on isomorphisms between standard tree-adjoining derivations, that avoids the expressivity and implementability problems of the original rewriting definition. The decrease in expressivity, which would otherwise make the method unusable, is offset by the incorporation of an alternative definition of standard tree-adjoining derivation, previously proposed for completely separate reasons, thereby making it practical to entertain using the natural definition of synchronous derivation. Nonetheless, some remaining problematic cases call for yet more flexibility in the definition; the isomorphism requirement may have to be relaxed. It remains for future research to tune the exact requirements on the allowable mappings.Comment: 21 pages, uses lingmacros.sty, psfig.sty, fullname.sty; minor typographical changes onl

    Semi Automated Partial Credit Grading of Programming Assignments

    Get PDF
    The grading of student programs is a time consuming process. As class sizes continue to grow, especially in entry level courses, manually grading student programs has become an even more daunting challenge. Increasing the difficulty of grading is the needs of graphical and interactive programs such as those used as part of the UNH Computer Science curriculum (and various textbooks). There are existing tools that support the grading of introductory programming assignments (TAME and Web-CAT). There are also frameworks that can be used to test student code (JUnit, Tester, and TestNG). While these programs and frameworks are helpful, they have little or no no support for programs that use real data structures or that have interactive or graphical features. In addition, the automated tests in all these tools provide only “all or nothing” evaluation. This is a significant limitation in many circumstances. Moreover, there is little or no support for dynamic alteration of grading criteria, which means that refactoring of test classes after deployment is not easily done. Our goal is to create a framework that can address these weaknesses. This framework needs to: 1. Support assignments that have interactive and graphical components. 2. Handle data structures in student programs such as lists, stacks, trees, and hash tables. 3. Be able to assign partial credit automatically when the instructor can predict errors in advance. 4. Provide additional answer clustering information to help graders identify and assign consistent partial credit for incorrect output that was not predefined. Most importantly, these tools, collectively called RPM (short for Rapid Program Management), should interface effectively with our current grading support framework without requiring large amounts of rewriting or refactoring of test code

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot
    • …
    corecore