16,646 research outputs found
MBT: A Memory-Based Part of Speech Tagger-Generator
We introduce a memory-based approach to part of speech tagging. Memory-based
learning is a form of supervised learning based on similarity-based reasoning.
The part of speech tag of a word in a particular context is extrapolated from
the most similar cases held in memory. Supervised learning approaches are
useful when a tagged corpus is available as an example of the desired output of
the tagger. Based on such a corpus, the tagger-generator automatically builds a
tagger which is able to tag new text the same way, diminishing development time
for the construction of a tagger considerably. Memory-based tagging shares this
advantage with other statistical or machine learning approaches. Additional
advantages specific to a memory-based approach include (i) the relatively small
tagged corpus size sufficient for training, (ii) incremental learning, (iii)
explanation capabilities, (iv) flexible integration of information in case
representations, (v) its non-parametric nature, (vi) reasonably good results on
unknown words without morphological analysis, and (vii) fast learning and
tagging. In this paper we show that a large-scale application of the
memory-based approach is feasible: we obtain a tagging accuracy that is on a
par with that of known statistical approaches, and with attractive space and
time complexity properties when using {\em IGTree}, a tree-based formalism for
indexing and searching huge case bases.} The use of IGTree has as additional
advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure
Semantic Ambiguity and Perceived Ambiguity
I explore some of the issues that arise when trying to establish a connection
between the underspecification hypothesis pursued in the NLP literature and
work on ambiguity in semantics and in the psychological literature. A theory of
underspecification is developed `from the first principles', i.e., starting
from a definition of what it means for a sentence to be semantically ambiguous
and from what we know about the way humans deal with ambiguity. An
underspecified language is specified as the translation language of a grammar
covering sentences that display three classes of semantic ambiguity: lexical
ambiguity, scopal ambiguity, and referential ambiguity. The expressions of this
language denote sets of senses. A formalization of defeasible reasoning with
underspecified representations is presented, based on Default Logic. Some
issues to be confronted by such a formalization are discussed.Comment: Latex, 47 pages. Uses tree-dvips.sty, lingmacros.sty, fullname.st
Preferential Multi-Context Systems
Multi-context systems (MCS) presented by Brewka and Eiter can be considered
as a promising way to interlink decentralized and heterogeneous knowledge
contexts. In this paper, we propose preferential multi-context systems (PMCS),
which provide a framework for incorporating a total preorder relation over
contexts in a multi-context system. In a given PMCS, its contexts are divided
into several parts according to the total preorder relation over them,
moreover, only information flows from a context to ones of the same part or
less preferred parts are allowed to occur. As such, the first preferred
parts of an PMCS always fully capture the information exchange between contexts
of these parts, and then compose another meaningful PMCS, termed the
-section of that PMCS. We generalize the equilibrium semantics for an MCS to
the (maximal) -equilibrium which represents belief states at least
acceptable for the -section of an PMCS. We also investigate inconsistency
analysis in PMCS and related computational complexity issues
Principle Based Semantics for HPSG
The paper presents a constraint based semantic formalism for HPSG. The
advantages of the formlism are shown with respect to a grammar for a fragment
of German that deals with (i) quantifier scope ambiguities triggered by
scrambling and/or movement and (ii) ambiguities that arise from the
collective/distributive distinction of plural NPs. The syntax-semantics
interface directly implements syntactic conditions on quantifier scoping and
distributivity. The construction of semantic representations is guided by
general principles governing the interaction between syntax and semantics. Each
of these principles acts as a constraint to narrow down the set of possible
interpretations of a sentence. Meanings of ambiguous sentences are represented
by single partial representations (so-called U(nderspecified) D(iscourse)
R(epresentation) S(tructure)s) to which further constraints can be added
monotonically to gain more information about the content of a sentence. There
is no need to build up a large number of alternative representations of the
sentence which are then filtered by subsequent discourse and world knowledge.
The advantage of UDRSs is not only that they allow for monotonic incremental
interpretation but also that they are equipped with truth conditions and a
proof theory that allows for inferences to be drawn directly on structures
where quantifier scope is not resolved
- …