2,111 research outputs found

    Using Contextual Representations to Efficiently Learn Context-Free Languages

    No full text
    International audienceWe present a polynomial update time algorithm for the inductive inference of a large class of context-free languages using the paradigm of positive data and a membership oracle. We achieve this result by moving to a novel representation, called Contextual Binary Feature Grammars (CBFGs), which are capable of representing richly structured context-free languages as well as some context sensitive languages. These representations explicitly model the lattice structure of the distribution of a set of substrings and can be inferred using a generalisation of distributional learning. This formalism is an attempt to bridge the gap between simple learnable classes and the sorts of highly expressive representations necessary for linguistic representation: it allows the learnability of a large class of context-free languages, that includes all regular languages and those context-free languages that satisfy two simple constraints. The formalism and the algorithm seem well suited to natural language and in particular to the modeling of first language acquisition. Preliminary experimental results confirm the effectiveness of this approach

    Semi-bracketed contextual grammars

    Get PDF
    Bracketed and fully bracketed contextual grammars were introduced to bring the concept of a tree structure to the strings by associating a pair of parentheses to the adjoined contexts in the derivation. In this paper, we show that these grammars fail to generate all the basic non-context-free languages, thus cannot be a syntactical model for natural languages. To overcome this failure, we introduce a new class of fully bracketed contextual grammars, called the semi-bracketed contextual grammars, where the selectors can also be non-minimally Dyck covered language. We see that the tree structure to the derived strings is still preserved in this variant. when this new grammar is combined with the maximality feature, the generative power of these grammars is increased to the extend of covering the family of context-free languages and some basic non-context-free languages, thus possessing many properties of the so called `MCS formalism'

    Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation

    Get PDF
    The paper presents the role and contribution of Natural Language Processing Techniques, in particular Negation Detection and Word Sense Disambiguation in the process of Semantic Annotation of Archaeological Grey Literature. Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents with respect to positive assertion

    An Abstract Machine for Unification Grammars

    Full text link
    This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil
    • …
    corecore