16,646 research outputs found

    MBT: A Memory-Based Part of Speech Tagger-Generator

    Full text link
    We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using {\em IGTree}, a tree-based formalism for indexing and searching huge case bases.} The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure

    Semantic Ambiguity and Perceived Ambiguity

    Full text link
    I explore some of the issues that arise when trying to establish a connection between the underspecification hypothesis pursued in the NLP literature and work on ambiguity in semantics and in the psychological literature. A theory of underspecification is developed `from the first principles', i.e., starting from a definition of what it means for a sentence to be semantically ambiguous and from what we know about the way humans deal with ambiguity. An underspecified language is specified as the translation language of a grammar covering sentences that display three classes of semantic ambiguity: lexical ambiguity, scopal ambiguity, and referential ambiguity. The expressions of this language denote sets of senses. A formalization of defeasible reasoning with underspecified representations is presented, based on Default Logic. Some issues to be confronted by such a formalization are discussed.Comment: Latex, 47 pages. Uses tree-dvips.sty, lingmacros.sty, fullname.st

    Preferential Multi-Context Systems

    Full text link
    Multi-context systems (MCS) presented by Brewka and Eiter can be considered as a promising way to interlink decentralized and heterogeneous knowledge contexts. In this paper, we propose preferential multi-context systems (PMCS), which provide a framework for incorporating a total preorder relation over contexts in a multi-context system. In a given PMCS, its contexts are divided into several parts according to the total preorder relation over them, moreover, only information flows from a context to ones of the same part or less preferred parts are allowed to occur. As such, the first ll preferred parts of an PMCS always fully capture the information exchange between contexts of these parts, and then compose another meaningful PMCS, termed the ll-section of that PMCS. We generalize the equilibrium semantics for an MCS to the (maximal) l≤l_{\leq}-equilibrium which represents belief states at least acceptable for the ll-section of an PMCS. We also investigate inconsistency analysis in PMCS and related computational complexity issues

    Principle Based Semantics for HPSG

    Full text link
    The paper presents a constraint based semantic formalism for HPSG. The advantages of the formlism are shown with respect to a grammar for a fragment of German that deals with (i) quantifier scope ambiguities triggered by scrambling and/or movement and (ii) ambiguities that arise from the collective/distributive distinction of plural NPs. The syntax-semantics interface directly implements syntactic conditions on quantifier scoping and distributivity. The construction of semantic representations is guided by general principles governing the interaction between syntax and semantics. Each of these principles acts as a constraint to narrow down the set of possible interpretations of a sentence. Meanings of ambiguous sentences are represented by single partial representations (so-called U(nderspecified) D(iscourse) R(epresentation) S(tructure)s) to which further constraints can be added monotonically to gain more information about the content of a sentence. There is no need to build up a large number of alternative representations of the sentence which are then filtered by subsequent discourse and world knowledge. The advantage of UDRSs is not only that they allow for monotonic incremental interpretation but also that they are equipped with truth conditions and a proof theory that allows for inferences to be drawn directly on structures where quantifier scope is not resolved
    • …
    corecore