12 research outputs found

    Natural language software registry (second edition)

    Get PDF

    An Abstract Machine for Unification Grammars

    Full text link
    This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

    Complementation, Quantification and Potential Energy

    Get PDF

    JTEC panel report on machine translation in Japan

    Get PDF
    The goal of this report is to provide an overview of the state of the art of machine translation (MT) in Japan and to provide a comparison between Japanese and Western technology in this area. The term 'machine translation' as used here, includes both the science and technology required for automating the translation of text from one human language to another. Machine translation is viewed in Japan as an important strategic technology that is expected to play a key role in Japan's increasing participation in the world economy. MT is seen in Japan as important both for assimilating information into Japanese as well as for disseminating Japanese information throughout the world. Most of the MT systems now available in Japan are transfer-based systems. The majority of them exploit a case-frame representation of the source text as the basis of the transfer process. There is a gradual movement toward the use of deeper semantic representations, and some groups are beginning to look at interlingua-based systems

    Darstellung und stochastische Auflösung von Ambiguität in constraint-basiertem Parsing

    Get PDF
    Diese Arbeit untersucht zwei komplementäre Ansätze zum Umgang mit Mehrdeutigkeiten bei der automatischen Verarbeitung natürlicher Sprache. Zunächst werden Methoden vorgestellt, die es erlauben, viele konkurrierende Interpretationen in einer gemeinsamen Datenstruktur kompakt zu repräsentieren. Dann werden Ansätze vorgeschlagen, die verschiedenen Interpretationen mit Hilfe von stochastischen Modellen zu bewerten. Für das dabei auftretende Problem, Wahrscheinlichkeiten von seltenen Ereignissen zu schätzen, die in den Trainingsdaten nicht auftraten, werden neuartige Methoden vorgeschlagen.This thesis investigates two complementary approches to cope with ambiguities in natural language processing. It first presents methods that allow to store many competing interpretations compactly in one shared datastructure. It then suggests approaches to score the different interpretations using stochastic models. This leads to the problem of estimation of probabilities of rare events that have not been observed in the training data, for which novel methods are proposed

    Generalisierte Phasenstruktur-Grammatiken und ihre Verwendung zur maschinellen Sprachverarbeitung

    Get PDF
    Der vorliegende Artikel setzt sich mit der Syntaxtheorie der Generalisierten Phrasenstruktur-Grammatiken (GPSG) auseinander, gibt eine neue formale Definition des aktuellen Formalismus aus und zeigt die mit diesem Formalismus verbundenen Probleme auf. Darüber hinaus wird begründet, warum der Formalismus nicht effizient implementierbar ist. Es wird eine konstruktive Version von GPSG vorgeschlagen, die für die maschinelle Sprachverarbeitung (Parsing und Generierung) geeignet ist. Der Artikel kann gleichzeitig als eine Grundlage für Lehrveranstaltungen über GPSG dienen.This article describes the syntax theory of Generalized Phrase Structure Grammar (GPSG), introduces a new formal definition and reveals the problems connected with this formalism. Moreover it is shown why the formalism cannot be implemented. A constructive version of GPSG is suggested that is suitable for parsing and generation. This report may also serve as a basis for lectures about GPSG

    Generalisierte Phasenstruktur-Grammatiken und ihre Verwendung zur maschinellen Sprachverarbeitung

    Get PDF
    Der vorliegende Artikel setzt sich mit der Syntaxtheorie der Generalisierten Phrasenstruktur-Grammatiken (GPSG) auseinander, gibt eine neue formale Definition des aktuellen Formalismus aus und zeigt die mit diesem Formalismus verbundenen Probleme auf. Darüber hinaus wird begründet, warum der Formalismus nicht effizient implementierbar ist. Es wird eine konstruktive Version von GPSG vorgeschlagen, die für die maschinelle Sprachverarbeitung (Parsing und Generierung) geeignet ist. Der Artikel kann gleichzeitig als eine Grundlage für Lehrveranstaltungen über GPSG dienen.This article describes the syntax theory of Generalized Phrase Structure Grammar (GPSG), introduces a new formal definition and reveals the problems connected with this formalism. Moreover it is shown why the formalism cannot be implemented. A constructive version of GPSG is suggested that is suitable for parsing and generation. This report may also serve as a basis for lectures about GPSG

    The Computational Analysis of the Syntax and Interpretation of Free Word Order in Turkish

    Get PDF
    In this dissertation, I examine a language with “free” word order, specifically Turkish, in order to develop a formalism that can capture the syntax and the context-dependent interpretation of “free” word order within a computational framework. In “free” word order languages, word order is used to convey distinctions in meaning that are not captured by traditional truth-conditional semantics. The word order indicates the “information structure”, e.g. what is the “topic” and the “focus” of the sentence. The context-appropriate use of “free” word order is of considerable importance in developing practical applications in natural language interpretation, generation, and machine translation. I develop a formalism called Multiset-CCG, an extension of Combinatory Categorial Grammars, CCGs, (Ades/Steedman 1982, Steedman 1985), and demonstrate its advantages in an implementation of a data-base query system that interprets Turkish questions and generates answers with contextually appropriate word orders. Multiset-CCG is a context-sensitive and polynomially parsable grammar that captures the formal and descriptive properties of “free” word order and restrictions on word order in simple and complex sentences (with discontinuous constituents and long distance dependencies). Multiset-CCG captures the context-dependent meaning of word order in Turkish by compositionally deriving the predicate-argument structure and the information structure of a sentence in parallel. The advantages of using such a formalism are that it is computationally attractive and that it provides a compositional and flexible surface structure that allows syntactic constituents to correspond to information structure constituents. A formalism that integrates information structure and syntax such as Multiset-CCG is essential to the computational tasks of interpreting and generating sentences with contextually appropriate word orders in “free” word order languages
    corecore