142 research outputs found

    LL(1) Parsing with Derivatives and Zippers

    Full text link
    In this paper, we present an efficient, functional, and formally verified parsing algorithm for LL(1) context-free expressions based on the concept of derivatives of formal languages. Parsing with derivatives is an elegant parsing technique, which, in the general case, suffers from cubic worst-case time complexity and slow performance in practice. We specialise the parsing with derivatives algorithm to LL(1) context-free expressions, where alternatives can be chosen given a single token of lookahead. We formalise the notion of LL(1) expressions and show how to efficiently check the LL(1) property. Next, we present a novel linear-time parsing with derivatives algorithm for LL(1) expressions operating on a zipper-inspired data structure. We prove the algorithm correct in Coq and present an implementation as a parser combinators framework in Scala, with enumeration and pretty printing capabilities.Comment: Appeared at PLDI'20 under the title "Zippy LL(1) Parsing with Derivatives

    Elementary data structures in ALGOL-like languages

    Get PDF
    AbstractJ.C. Reynolds has pointed out that ALGOL 60 has a set of properties not shared by most of the languages usually regarded as being its successors. We propose to use this ALGOL-like framework to design a language that could adequately support both applicative and imperative programming while also retaining the advantages of each of the “pure” frameworks. This paper discusses elementary data-structuring facilities (products, arrays, sums) for such a language, taking advantage of recent developments, such as this author's “quantification” notation, and the notion of “conjunctive type” proposed by Coppo and Dezani, and adapted to explicitly-typed languages by Reynolds

    Maude: specification and programming in rewriting logic

    Get PDF
    Maude is a high-level language and a high-performance system supporting executable specification and declarative programming in rewriting logic. Since rewriting logic contains equational logic, Maude also supports equational specification and programming in its sublanguage of functional modules and theories. The underlying equational logic chosen for Maude is membership equational logic, that has sorts, subsorts, operator overloading, and partiality definable by membership and equality conditions. Rewriting logic is reflective, in the sense of being able to express its own metalevel at the object level. Reflection is systematically exploited in Maude endowing the language with powerful metaprogramming capabilities, including both user-definable module operations and declarative strategies to guide the deduction process. This paper explains and illustrates with examples the main concepts of Maude's language design, including its underlying logic, functional, system and object-oriented modules, as well as parameterized modules, theories, and views. We also explain how Maude supports reflection, metaprogramming and internal strategies. The paper outlines the principles underlying the Maude system implementation, including its semicompilation techniques. We conclude with some remarks about applications, work on a formal environment for Maude, and a mobile language extension of Maude

    Darstellung und stochastische Auflösung von Ambiguität in constraint-basiertem Parsing

    Get PDF
    Diese Arbeit untersucht zwei komplementäre Ansätze zum Umgang mit Mehrdeutigkeiten bei der automatischen Verarbeitung natürlicher Sprache. Zunächst werden Methoden vorgestellt, die es erlauben, viele konkurrierende Interpretationen in einer gemeinsamen Datenstruktur kompakt zu repräsentieren. Dann werden Ansätze vorgeschlagen, die verschiedenen Interpretationen mit Hilfe von stochastischen Modellen zu bewerten. Für das dabei auftretende Problem, Wahrscheinlichkeiten von seltenen Ereignissen zu schätzen, die in den Trainingsdaten nicht auftraten, werden neuartige Methoden vorgeschlagen.This thesis investigates two complementary approches to cope with ambiguities in natural language processing. It first presents methods that allow to store many competing interpretations compactly in one shared datastructure. It then suggests approaches to score the different interpretations using stochastic models. This leads to the problem of estimation of probabilities of rare events that have not been observed in the training data, for which novel methods are proposed

    Identifying nocuous ambiguity in natural language requirements

    Get PDF
    This dissertation is an investigation into how ambiguity should be classified for authors and readers of text, and how this process can be automated. Usually, authors and readers disambiguate ambiguity, either consciously or unconsciously. However, disambiguation is not always appropriate. For instance, a linguistic construction may be read differently by different people, with no consensus about which reading is the intended one. This is particularly dangerous if they do not realise that other readings are possible. Misunderstandings may then occur. This is particularly serious in the field of requirements engineering. If requirements are misunderstood, systems may be built incorrectly, and this can prove very costly. Our research uses natural language processing techniques to address ambiguity in requirements. We develop a model of ambiguity, and a method of applying it, which represent a novel approach to the problem described here. Our model is based on the notion that human perception is the only valid criterion for judging ambiguity. If people perceive very differently how an ambiguity should be read, it will cause misunderstandings. Assigning a preferred reading to it is therefore unwise. In text, such ambiguities should be located and rewritten in a less ambiguous form; others need not be reformulated. We classify the former as nocuous and the latter as innocuous. We allow the dividing line between these two classifications to be adjustable. We term this the ambiguity threshold, and it represents a level of intolerance to ambiguity. A nocuous ambiguity can be an unacknowledged or an acknowledged ambiguity for a given set of readers. In the former case, they assign disparate readings to the ambiguity, but each is unaware that the others read it differently. In the latter case, they recognise that the ambiguity has more than one reading, but this fact may be unacknowledged by new readers. We present an automated approach to determine whether ambiguities in text are nocuous or innocuous. We use heuristics to distinguish ambiguities for which there is a strong consensus about how they should be read. These are innocuous ambiguities. The remaining nocuous ambiguities can then be rewritten at a later stage. We find consensus opinions about ambiguities by surveying human perceptions on them. Our heuristics try to predict these perceptions automatically. They utilise various types of linguistic information: generic corpus data, morphology and lexical subcategorisations are the most successful. We use coordination ambiguity as the test case for this research. This occurs where the scope of words such as and and or is unclear. Our research contributes to both the requirements engineering and the natural language processing literatures. Ambiguity is known to be a serious problem in requirements engineering, but has rarely been dealt with effectively and thoroughly. Our approach is an appropriate solution, and our flexible ambiguity threshold is a particularly useful concept. For instance, high ambiguity intolerance can be implemented when writing requirements for safety-critical systems. Coordination ambiguities are widespread and known to cause misunderstandings, but have received comparatively little attention. Our heuristics show that linguistic data can be used successfully to predict preferred readings of very diverse coordinations. Used in combination, these heuristics demonstrate that nocuous ambiguity can be distinguished from innocuous ambiguity under certain conditions. Employing appropriate ambiguity thresholds, accuracy representing 28% improvement on the baselines can be achieved

    Maude: specification and programming in rewriting logic

    Get PDF
    AbstractMaude is a high-level language and a high-performance system supporting executable specification and declarative programming in rewriting logic. Since rewriting logic contains equational logic, Maude also supports equational specification and programming in its sublanguage of functional modules and theories. The underlying equational logic chosen for Maude is membership equational logic, that has sorts, subsorts, operator overloading, and partiality definable by membership and equality conditions. Rewriting logic is reflective, in the sense of being able to express its own metalevel at the object level. Reflection is systematically exploited in Maude endowing the language with powerful metaprogramming capabilities, including both user-definable module operations and declarative strategies to guide the deduction process. This paper explains and illustrates with examples the main concepts of Maude's language design, including its underlying logic, functional, system and object-oriented modules, as well as parameterized modules, theories, and views. We also explain how Maude supports reflection, metaprogramming and internal strategies. The paper outlines the principles underlying the Maude system implementation, including its semicompilation techniques. We conclude with some remarks about applications, work on a formal environment for Maude, and a mobile language extension of Maude

    The Morphosyntactic Parser: Developing and testing a sentence processor that uses underspecified morphosyntactic features

    Get PDF
    This dissertation presents a fundamentally new approach to describe not only the architecture of the language system but also the processes behind its capability to predict, analyze and integrate linguistic input into its representation in a parsimonious way. By the example of morphosyntax, underspecified case, the use of decomposed, binary case, number and gender features to account for syncretism, will offer insights into both: Carrying over this idea to language processing raises the question whether the language system—limited in its storage capacity—makes use of similar means of representational parsimony during the processing of linguistic input. This thesis will propose a processing system that is tightly related to the aforementioned architectural assumptions of morphosyntactically underspecified lexical entries as a parsimonious way of representation. In that sense, prediction is viewed as the language system’s drive to avoid feature deviance from one incrementally available linguistic element to another subsequentially incoming one. In this way, the parser’s goal is to maintain minimal feature deviance or at best feature identity to keep processing load as low as possible. This approach allows for position-dependent hypothesis with regard to the expected processing load. To test the processor’s claims, the electrophysiological data of a series of event-related brain potential (ERP) experiments will be presented. The results suggest that with the input’s increased feature deviance the amplitude of an ERP component sensitive for prediction error increases. In comparison to that, elements that rather maintain feature identity and that do not lack or introduce additional features to the analysis do not increase processing difficulty. These results indicate that the language processing system uses the available features of morphosyntactically underspecified mental entries to build up larger constituents. The experiments showed, that this buildup process is determined by the language system’s drive to avoid feature deviance

    Constructive Logics Part I: A Tutorial on Proof Systems and Typed Lambda-Calculi

    Get PDF
    The purpose of this paper is to give an exposition of material dealing with constructive logic, typed λ-calculi, and linear logic. The emergence in the past ten years of a coherent field of research often named logic and computation has had two major (and related) effects: firstly, it has rocked vigorously the world of mathematical logic; secondly, it has created a new computer science discipline, which spans from what is traditionally called theory of computation, to programming language design. Remarkably, this new body of work relies heavily on some old concepts found in mathematical logic, like natural deduction, sequent calculus, and λ-calculus (but often viewed in a different light), and also on some newer concepts. Thus, it may be quite a challenge to become initiated to this new body of work (but the situation is improving, there are now some excellent texts on this subject matter). This paper attempts to provide a coherent and hopefully gentle initiation to this new body of work. We have attempted to cover the basic material on natural deduction, sequent calculus, and typed λ-calculus, but also to provide an introduction to Girard\u27s linear logic, one of the most exciting developments in logic these past five years. The first part of these notes gives an exposition of background material (with the exception of the Girard-translation of classical logic into intuitionistic logic, which is new). The second part is devoted to linear logic and proof nets

    A Formal Methodology for Deriving Purely Functional Programs From Z Specifications via the Intermediate Specification Language FunZ.

    Get PDF
    In recent years, both formal methods and software reuse have been increasingly advocated as a means of alleviating the ills of the software crisis. During this same time period, purely functional programming languages, which have a long history in the realm of rapid prototyping, have emerged as a viable medium for real-world applications. Since these trends are likely to continue, this work describes a methodology that facilitates the derivation of purely functional programs from existing Z specifications. A unique aspect of the methodology is its incorporation of an intermediate specification language (FunZ) during the design phase of software development. Most of the previous techniques for translating Z specifications to functional programs were designed primarily to expedite rapid prototyping. In contrast, the FunZ methodology, which is an adapted form of the IBM Hursley method, is a comprehensive approach, spanning the software life cycle from specification through design to final implementation. Due to its greater scope, the FunZ methodology offers several advantages over existing approaches. First, the specification language integrates features from Z with those of the functional programming paradigm to provide a bridge between Z specifications and functional implementations. Since FunZ is expressly designed to target functional languages, the implementor\u27s job is simplified. In fact, a FunZ document looks like extended Haskell code, so an obvious effect in applying FunZ is that the distance from design to code is reduced. Second, the methodology provides a framework for recording design decisions, which is useful for future maintenance. Within this framework, users may select a development path ranging from an intuitive style to a fully formal approach that includes the proofs of functional refinement. Furthermore, FunZ allows software developers to prove properties about a system design within the realm of Z or Haskell. This means that proofs can be performed throughout software development and the designer is free to select the most appropriate notation. In summary, the intermediate specification language FunZ and its related methodology provide software developers with a complete, formal approach for translating Z specifications to Haskell implementations. Previously, such extensive methods were only available for traditional, imperative languages
    corecore