67 research outputs found
Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences
Selectional preferences have been used by word sense disambiguation (WSD) systems as one source of disambiguating information. We evaluate WSD using selectional preferences acquired for English adjective—noun, subject, and direct object grammatical relationships with respect to a standard test corpus. The selectional preferences are specific to verb or adjective classes, rather than individual word forms, so they can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. We also investigate use of the one-senseper-discourse heuristic to propagate a sense tag for a word to other occurrences of the same word within the current document in order to increase coverage. Although the preferences perform well in comparison with other unsupervised WSD systems on the same corpus, the results show that for many applications, further knowledge sources would be required to achieve an adequate level of accuracy and coverage. In addition to quantifying performance, we analyze the results to investigate the situations in which the selectional preferences achieve the best precision and in which the one-sense-per-discourse heuristic increases performance
Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007
This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p
D7.1. Criteria for evaluation of resources, technology and integration.
This deliverable defines how evaluation is carried out at each integration cycle in the PANACEA project. As PANACEA aims at producing large scale resources, evaluation becomes a critical and challenging issue. Critical because it is important to assess the quality of the results that should be delivered to users. Challenging because we prospect rather new areas, and through a technical platform: some new methodologies will have to be explored or old ones to be adapted
Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation
Existing approaches to automatic VerbNet-style verb classification are
heavily dependent on feature engineering and therefore limited to languages
with mature NLP pipelines. In this work, we propose a novel cross-lingual
transfer method for inducing VerbNets for multiple languages. To the best of
our knowledge, this is the first study which demonstrates how the architectures
for learning word embeddings can be applied to this challenging
syntactic-semantic task. Our method uses cross-lingual translation pairs to tie
each of the six target languages into a bilingual vector space with English,
jointly specialising the representations to encode the relational information
from English VerbNet. A standard clustering algorithm is then run on top of the
VerbNet-specialised representations, using vector dimensions as features for
learning verb classes. Our results show that the proposed cross-lingual
transfer approach sets new state-of-the-art verb classification performance
across all six target languages explored in this work.Comment: EMNLP 2017 (long paper
D6.1: Technologies and Tools for Lexical Acquisition
This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)
WORD SENSE DISAMBIGUATION WITHIN A MULTILINGUAL FRAMEWORK
Word Sense Disambiguation (WSD) is the process of resolving the meaning of a
word unambiguously in a given natural language context. Within the scope of this
thesis, it is the process of marking text with explicit sense labels.
What constitutes a sense is a subject of great debate. An appealing perspective,
aims to define senses in terms of their multilingual correspondences, an idea explored
by several researchers, Dyvik (1998), Ide (1999), Resnik & Yarowsky (1999), and
Chugur, Gonzalo & Verdejo (2002) but to date it has not been given any practical
demonstration. This thesis is an empirical validation of these ideas of characterizing
word meaning using cross-linguistic correspondences. The idea is that word meaning
or word sense is quantifiable as much as it is uniquely translated in some language or
set of languages.
Consequently, we address the problem of WSD from a multilingual perspective;
we expand the notion of context to encompass multilingual evidence. We devise a
new approach to resolve word sense ambiguity in natural language, using a source of
information that was never exploited on a large scale for WSD before.
The core of the work presented builds on exploiting word correspondences across
languages for sense distinction. In essence, it is a practical and functional implementation
of a basic idea common to research interest in defining word meanings in
cross-linguistic terms.
We devise an algorithm, SALAAM for Sense Assignment Leveraging Alignment
And Multilinguality, that empirically investigates the feasibility and the validity of utilizing
translations for WSD. SALAAM is an unsupervised approach for word sense
tagging of large amounts of text given a parallel corpus — texts in translation — and
a sense inventory for one of the languages in the corpus. Using SALAAM, we obtain
large amounts of sense annotated data in both languages of the parallel corpus, simultaneously.
The quality of the tagging is rigorously evaluated for both languages of the
corpora.
The automatic unsupervised tagged data produced by SALAAM is further utilized
to bootstrap a supervised learning WSD system, in essence, combining supervised and
unsupervised approaches in an intelligent way to alleviate the resources acquisition
bottleneck for supervised methods. Essentially, SALAAM is extended as an unsupervised
approach for WSD within a learning framework; in many of the cases of the
words disambiguated, SALAAM coupled with the machine learning system rivals the
performance of a canonical supervised WSD system that relies on human tagged data
for training.
Realizing the fundamental role of similarity for SALAAM, we investigate different
dimensions of semantic similarity as it applies to verbs since they are relatively
more complex than nouns, which are the focus of the previous evaluations. We design
a human judgment experiment to obtain human ratings on verbs’ semantic similarity.
The obtained human ratings are cast as a reference point for comparing different
automated similarity measures that crucially rely on various sources of information.
Finally, a cognitively salient model integrating human judgments in SALAAM is proposed
as a means of improving its performance on sense disambiguation for verbs in
particular and other word types in general
- …