2,868 research outputs found
Selective Sampling for Example-based Word Sense Disambiguation
This paper proposes an efficient example sampling method for example-based
word sense disambiguation systems. To construct a database of practical size, a
considerable overhead for manual sense disambiguation (overhead for
supervision) is required. In addition, the time complexity of searching a
large-sized database poses a considerable problem (overhead for search). To
counter these problems, our method selectively samples a smaller-sized
effective subset from a given example set for use in word sense disambiguation.
Our method is characterized by the reliance on the notion of training utility:
the degree to which each example is informative for future example sampling
when used for the training of the system. The system progressively collects
examples by selecting those with greatest utility. The paper reports the
effectiveness of our method through experiments on about one thousand
sentences. Compared to experiments with other example sampling methods, our
method reduced both the overhead for supervision and the overhead for search,
without the degeneration of the performance of the system.Comment: 25 pages, 14 Postscript figure
Efficient deep processing of japanese
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages
Distinguishing Word Senses in Untagged Text
This paper describes an experimental comparison of three unsupervised
learning algorithms that distinguish the sense of an ambiguous word in untagged
text. The methods described in this paper, McQuitty's similarity analysis,
Ward's minimum-variance method, and the EM algorithm, assign each instance of
an ambiguous word to a known sense definition based solely on the values of
automatically identifiable features in text. These methods and feature sets are
found to be more successful in disambiguating nouns rather than adjectives or
verbs. Overall, the most accurate of these procedures is McQuitty's similarity
analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st
Thematic Annotation: extracting concepts out of documents
Contrarily to standard approaches to topic annotation, the technique used in
this work does not centrally rely on some sort of -- possibly statistical --
keyword extraction. In fact, the proposed annotation algorithm uses a large
scale semantic database -- the EDR Electronic Dictionary -- that provides a
concept hierarchy based on hyponym and hypernym relations. This concept
hierarchy is used to generate a synthetic representation of the document by
aggregating the words present in topically homogeneous document segments into a
set of concepts best preserving the document's content.
This new extraction technique uses an unexplored approach to topic selection.
Instead of using semantic similarity measures based on a semantic resource, the
later is processed to extract the part of the conceptual hierarchy relevant to
the document content. Then this conceptual hierarchy is searched to extract the
most relevant set of concepts to represent the topics discussed in the
document. Notice that this algorithm is able to extract generic concepts that
are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure
Using WordNet for Building WordNets
This paper summarises a set of methodologies and techniques for the fast
construction of multilingual WordNets. The English WordNet is used in this
approach as a backbone for Catalan and Spanish WordNets and as a lexical
knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL
Semantic Types, Lexical Sorts and Classifiers
We propose a cognitively and linguistically motivated set of sorts for
lexical semantics in a compositional setting: the classifiers in languages that
do have such pronouns. These sorts are needed to include lexical considerations
in a semantical analyser such as Boxer or Grail. Indeed, all proposed lexical
extensions of usual Montague semantics to model restriction of selection,
felicitous and infelicitous copredication require a rich and refined type
system whose base types are the lexical sorts, the basis of the many-sorted
logic in which semantical representations of sentences are stated. However,
none of those approaches define precisely the actual base types or sorts to be
used in the lexicon. In this article, we shall discuss some of the options
commonly adopted by researchers in formal lexical semantics, and defend the
view that classifiers in the languages which have such pronouns are an
appealing solution, both linguistically and cognitively motivated
Parallel grammaticalizations in Tibeto-Burman : evidence of Sapir's 'Drift'
In chapters seven and eight of his book Language, Sapir talked about what he called ‘drift’, the changes that a language undergoes through time [...]. Dialects of a language are formed when that language is broken into different segments that no longer move along the same exact drift. Even so, the general drift of a language has its deep and its shallow currents; those features that distinguish closely related dialects will be of the rapid, shallow currents, while the deeper, slower currents may remain consistent between the dialects for millennia. It is this latter type that Sapir felt is ‘fundamental to the genius of the language’ (p. 172), and he said that ‘The momentum of the more fundamental, the pre-dialectal, drift is often such that languages long disconnected will pass through the same or strikingly similar phases’ (p. 172)
The processing of ambiguous sentences by first and second language learners of English
This study compares the way English-speaking children and adult second language learners of English resolve relative clause attachment ambiguities in sentences such as The dean liked the secretary of the professor who was reading a letter. Two groups of advanced L2 learners of English with Greek or German as their L1 participated in a set of off-line and on-line tasks. While the participants ' disambiguation preferences were influenced by lexical-semantic properties of the preposition linking the two potential antecedent NPs (of vs. with), there was no evidence that they were applying any structure-based ambiguity resolution strategies of the type that have been claimed to influence sentence processing in monolingual adults. These findings differ markedly from those obtained from 6 to 7 yearold monolingual English children in a parallel auditory study (Felser, Marinis, & Clahsen, submitted) in that the children's attachment preferences were not affected by the type of preposition at all. We argue that whereas children primarily rely on structure-based parsing principles during processing, adult L2 learners are guided mainly by non-structural informatio
- …