358 research outputs found
Annotating Words Using WordNet Semantic Glosses
An approach to the word sense disambiguation (WSD) relaying on
the WordNet synsets is proposed. The method uses semantically tagged glosses
to perform a process similar to the spreading activation in semantic network,
creating ranking of the most probable meanings for word annotation. Preliminary
evaluation shows quite promising results. Comparison with the state-of-theart
WSD methods indicates that the use of WordNet relations and semantically
tagged glosses should enhance accuracy of word disambiguation methods
Distributional Measures of Semantic Distance: A Survey
The ability to mimic human notions of semantic distance has widespread
applications. Some measures rely only on raw text (distributional measures) and
some rely on knowledge sources such as WordNet. Although extensive studies have
been performed to compare WordNet-based measures with human judgment, the use
of distributional measures as proxies to estimate semantic distance has
received little attention. Even though they have traditionally performed poorly
when compared to WordNet-based measures, they lay claim to certain uniquely
attractive features, such as their applicability in resource-poor languages and
their ability to mimic both semantic similarity and semantic relatedness.
Therefore, this paper presents a detailed study of distributional measures.
Particular attention is paid to flesh out the strengths and limitations of both
WordNet-based and distributional measures, and how distributional measures of
distance can be brought more in line with human notions of semantic distance.
We conclude with a brief discussion of recent work on hybrid measures
A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
A growing interest in tasks involving language understanding by the NLP
community has led to the need for effective semantic parsing and inference.
Modern NLP systems use semantic representations that do not quite fulfill the
nuanced needs for language understanding: adequately modeling language
semantics, enabling general inferences, and being accurately recoverable. This
document describes underspecified logical forms (ULF) for Episodic Logic (EL),
which is an initial form for a semantic representation that balances these
needs. ULFs fully resolve the semantic type structure while leaving issues such
as quantifier scope, word sense, and anaphora unresolved; they provide a
starting point for further resolution into EL, and enable certain structural
inferences without further resolution. This document also presents preliminary
results of creating a hand-annotated corpus of ULFs for the purpose of training
a precise ULF parser, showing a three-person pairwise interannotator agreement
of 0.88 on confident annotations. We hypothesize that a divide-and-conquer
approach to semantic parsing starting with derivation of ULFs will lead to
semantic analyses that do justice to subtle aspects of linguistic meaning, and
will enable construction of more accurate semantic parsers.Comment: Accepted for publication at The 13th International Conference on
Computational Semantics (IWCS 2019
Huge automatically extracted training sets for multilingual Word Sense Disambiguation
We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low- resourced languages and providing competitive results for English, where manually annotated training sets are accessible. The data is available at trainomatic. org
New frontiers in supervised word sense disambiguation: building multilingual resources and neural models on a large scale
Word Sense Disambiguation is a long-standing task in Natural Language Processing
(NLP), lying at the core of human language understanding. While it has already
been studied from many different angles over the years, ranging from knowledge
based systems to semi-supervised and fully supervised models, the field seems to
be slowing down in respect to other NLP tasks, e.g., part-of-speech tagging and
dependencies parsing. Despite the organization of several international competitions
aimed at evaluating Word Sense Disambiguation systems, the evaluation of automatic
systems has been problematic mainly due to the lack of a reliable evaluation
framework aiming at performing a direct quantitative confrontation.
To this end we develop a unified evaluation framework and analyze the performance
of various Word Sense Disambiguation systems in a fair setup. The results
show that supervised systems clearly outperform knowledge-based models. Among
the supervised systems, a linear classifier trained on conventional local features
still proves to be a hard baseline to beat. Nonetheless, recent approaches exploiting
neural networks on unlabeled corpora achieve promising results, surpassing this
hard baseline in most test sets. Even though supervised systems tend to perform
best in terms of accuracy, they often lose ground to more flexible knowledge-based
solutions, which do not require training for every disambiguation target. To bridge
this gap we adopt a different perspective and rely on sequence learning to frame
the disambiguation problem: we propose and study in depth a series of end-to-end
neural architectures directly tailored to the task, from bidirectional Long ShortTerm
Memory to encoder-decoder models. Our extensive evaluation over standard
benchmarks and in multiple languages shows that sequence learning enables more
versatile all-words models that consistently lead to state-of-the-art results, even
against models trained with engineered features.
However, supervised systems need annotated training corpora and the few available
to date are of limited size: this is mainly due to the expensive and timeconsuming
process of annotating a wide variety of word senses at a reasonably high
scale, i.e., the so-called knowledge acquisition bottleneck. To address this issue, we
also present different strategies to acquire automatically high quality sense annotated
data in multiple languages, without any manual effort. We assess the quality of the
sense annotations both intrinsically and extrinsically achieving competitive results
on multiple tasks
Semantic Web Service Engineering: Annotation Based Approach
Web services are an emerging paradigm which aims at implementing software components in the Web. They are based on syntactic standards, notably WSDL. Semantic annotation of Web services provides better qualitative and scalable solutions to the areas of service interoperation, service discovery, service composition and process orchestration. Manual annotation is a time-consuming process which requires deep domain knowledge and consistency of interpretation within annotation teams. Therefore, we propose an approach for semi-automatically annotating WSDL Web services descriptions. This is allowed by Semantic Web Service Engineering. The annotation approach consists of two main processes: categorization and matching. Categorization process consists in classifying WSDL service description to its corresponding domain. Matching process consists in mapping WSDL entities to pre-existing domain ontology. Both categorization and matching rely on ontology matching techniques. A tool has been developed and some experiments have been carried out to evaluate the proposed approach
- …