11 research outputs found
HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings
We present our system for semantic frame induction that showed the best
performance in Subtask B.1 and finished as the runner-up in Subtask A of the
SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et
al., 2019). Our approach separates this task into two independent steps: verb
clustering using word and their context embeddings and role labeling by
combining these embeddings with syntactical features. A simple combination of
these steps shows very competitive results and can be extended to process other
datasets and languages.Comment: 5 pages, 3 tables, accepted at SemEval 201
A new semantically annotated corpus with syntactic-semantic and cross-lingual senses
International audienceIn this article, we describe a new sense-tagged corpus for Word Sense Disambiguation. The corpus is constituted of instances of 20 French polysemous verbs. Each verb instance is annotated with three sense labels: (1) the actual translation of the verb in the english version of this instance in a parallel corpus, (2) an entry of the verb in a computational dictionary of French (the Lexicon-Grammar tables) and (3) a fine-grained sense label resulting from the concatenation of the translation and the Lexicon-Grammar entry
Word Sense Disambiguation on English Translation of Holy Quran
This article proposes a system based on the interpretation on the Quranic text that has been translated into English language using word sense disambiguation. This system is based on a combination of three traditional semantic similarity measurements, which are Wu-Palmer (WUP), Lin (LIN), and Jiang-Conrath (JCN) for word sense disambiguation on the English Al-Quran. The experiment was performed to obtain the best overall similarity score. The empirical results demonstrate that the combination of the three mentioned semantic similarity techniques obtained competitive results when compared with using individual similarity measurements
Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation
Existing approaches to automatic VerbNet-style verb classification are
heavily dependent on feature engineering and therefore limited to languages
with mature NLP pipelines. In this work, we propose a novel cross-lingual
transfer method for inducing VerbNets for multiple languages. To the best of
our knowledge, this is the first study which demonstrates how the architectures
for learning word embeddings can be applied to this challenging
syntactic-semantic task. Our method uses cross-lingual translation pairs to tie
each of the six target languages into a bilingual vector space with English,
jointly specialising the representations to encode the relational information
from English VerbNet. A standard clustering algorithm is then run on top of the
VerbNet-specialised representations, using vector dimensions as features for
learning verb classes. Our results show that the proposed cross-lingual
transfer approach sets new state-of-the-art verb classification performance
across all six target languages explored in this work.Comment: EMNLP 2017 (long paper
Recommended from our members
Acquiring verb classes through bottom-up semantic verb clustering
In this paper, we present the first analysis of bottom-up manual semantic clustering of verbs in three languages, English, Polish and Croatian. Verb classes including syntactic and semantic information have been shown to support many NLP tasks by allowing abstraction from individual words and thereby alleviating data sparseness. The availability of such classifications is however still non-existent or limited in most languages. While a range of automatic verb classification approaches have been proposed, high-quality resources and gold standards are needed for evaluation and to improve the performance of NLP systems. We investigate whether semantic verb classes in three different languages can be reliably obtained from native speakers without linguistics training. The analysis of inter-annotator agreement shows an encouraging degree of overlap in the classifications produced for each language individually, as well as across all three languages. Comparative examination of the resultant classifications provides interesting insights into cross-linguistic semantic commonalities and patterns of ambiguity
Single Classifier Approach for Verb Sense Disambiguation based on Generalized Features
Abstract We present a supervised method for verb sense disambiguation based on VerbNet. Most previous supervised approaches to verb sense disambiguation create a classifier for each verb that reaches a frequency threshold. These methods, however, have a significant practical problem that they cannot be applied to rare or unseen verbs. In order to overcome this problem, we create a single classifier to be applied to rare or unseen verbs in a new text. This single classifier also exploits generalized semantic features of a verb and its modifiers in order to better deal with rare or unseen verbs. Our experimental results show that the proposed method achieves equivalent performance to per-verb classifiers, which cannot be applied to unseen verbs. Our classifier could be utilized to improve the classifications in lexical resources of verbs, such as VerbNet, in a semi-automatic manner and to possibly extend the coverage of these resources to new verbs
Investigating the cross-lingual translatability of VerbNet-style classification.
VerbNet-the most extensive online verb lexicon currently available for English-has proved useful in supporting a variety of NLP tasks. However, its exploitation in multilingual NLP has been limited by the fact that such classifications are available for few languages only. Since manual development of VerbNet is a major undertaking, researchers have recently translated VerbNet classes from English to other languages. However, no systematic investigation has been conducted into the applicability and accuracy of such a translation approach across different, typologically diverse languages. Our study is aimed at filling this gap. We develop a systematic method for translation of VerbNet classes from English to other languages which we first apply to Polish and subsequently to Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on Polish demonstrate high translatability with all the classes (96% of English member verbs successfully translated into Polish) and strong inter-annotator agreement, revealing a promising degree of overlap in the resultant classifications. The results on other languages are equally promising. This demonstrates that VerbNet classes have strong cross-lingual potential and the proposed method could be applied to obtain gold standards for automatic verb classification in different languages. We make our annotation guidelines and the six language-specific verb classifications available with this paper
Recommended from our members
An Empirical Comparison of VerbNet Syntactic Frames and the Semlink Corpus
This paper describes a method of automatically comparing syntactic frames from the verb lexicon VerbNet with syntactic frames from the Semlink corpus. A method of extracting syntactic frames and semantic argument structures is explained, followed by a method of comparing syntactic frames, both directly and by argument structure. The results of the comparison are described in terms of matching success for frame tokens and frame types, divided into categories based on frame type frequency within Semlink. Overall, 54.14% of the frame tokens within Semlink can be directly matched to VerbNet, with an additional 14.32% matching by argument structure. However, only 29.30% of the frame types within Semlink can be matched to VerbNet, suggesting that the comparison method cannot match a majority of the large variation of frames types in Semlink. A set of distinguishing frame types for VerbNet classes is also proposed and included in this work
Investigating the cross-lingual translatability of VerbNet-style classification
VerbNet—the most extensive online verb lexicon currently available for
English—has proved useful in supporting a variety of NLP tasks. However,
its exploitation in multilingual NLP has been limited by the fact that
such classifications are available for few languages only. Since manual
development of VerbNet is a major undertaking, researchers have recently
translated VerbNet classes from English to other languages. However, no
systematic investigation has been conducted into the applicability and
accuracy of such a translation approach across different, typologically
diverse languages. Our study is aimed at filling this gap. We develop a
systematic method for translation of VerbNet classes from English to
other languages which we first apply to Polish and subsequently to
Croatian, Mandarin, Japanese, Italian, and Finnish. Our results on
Polish demonstrate high translatability with all the classes (96% of
English member verbs successfully translated into Polish) and strong
inter-annotator agreement, revealing a promising degree of overlap in
the resultant classifications. The results on other languages are
equally promising. This demonstrates that VerbNet classes have strong
cross-lingual potential and the proposed method could be applied to
obtain gold standards for automatic verb classification in different
languages. We make our annotation guidelines and the six
language-specific verb classifications available with this paper. © 2017
The Author(s)</p
Entitate izendunen desanbiguazioa ezagutza-base erraldoien arabera
130 p.Gaur egun, interneten nabigatzeko orduan, ia-ia ezinbestekoak dira bilatza-ileak, eta guztietatik ezagunena Google da. Bilatzaileek egungo arrakastarenzati handi bat ezagutza-baseen ustiaketatik eskuratu dute. Izan ere, bilaketasemantikoekin kontsulta soilak ezagutza-baseetako informazioaz aberastekogai dira. Esate baterako, musika talde bati buruzko informazioa bilatzean,bere diskografia edo partaideetara esteka gehigarriak eskaintzen dituzte. Her-rialde bateko lehendakariari buruzko informazioa bilatzean, lehendakari izan-dakoen estekak edo lurralde horretako informazio gehigarria eskaintzen dute.Hala ere, gaur egun pil-pilean dauden bilaketa semantikoen arrakasta kolokanjarriko duen arazoa existitzen da. Termino anbiguoek ezagutza-baseetatikeskuratuko den informazioaren egokitasuna baldintzatuko dute. Batez ere,arazo handienak izen berezien edo entitate izendunen aipamenek sortuko di-tuzte.Tesi-lan honen helburu nagusia entitate izendunen desanbiguazioa (EID)aztertu, eta hau burutzeko teknika berriak proposatzea da. EID sistemektestuetako izen-aipamenak desanbiguatu, eta ezagutza-baseetako entitateekinlotuko dituzte. Izen-aipamenen izaera anbiguoa dela eta, hainbat entitateizendatu ditzakete. Gainera, entitate berdina hainbat izen ezberdinekinizendatu daiteke, beraz, aipamen hauek egoki desanbiguatzea tesiaren gakoaizango da.Horretarako, lehenik, arloaren egoeraren oinarri diren bi desanbiguazioeredu aztertuko dira. Batetik, ezagutza-baseen egituraz baliatzen den ereduvglobala, eta bestetik, aipamenaren testuinguruko hitzen informazioa usti-atzen duen eredu lokala. Ondoren, bi informazio iturriak modu osagarriankonbinatuko dira. Konbinazioak arloaren egoerako emaitzak hainbat datu-multzo ezberdinetan gaindituko ditu, eta gainontzekoetan pareko emaitzaklortuko ditu.Bigarrenik, edozein desanbiguazio-sistema hobetzeko helburuarekin ideiaberritzaileak proposatu, aztertu eta ebaluatu dira. Batetik, diskurtso, bil-duma eta agerkidetza mailan entitateen portaera aztertu da, entitateek pa-troi jakin bat betetzen dutela baieztatuz. Ondoren, patroi horretan oinar-rituz eredu globalaren, lokalaren eta beste EID sistema baten emaitzak moduadierazgarrian hobetu dira. Bestetik, eredu lokala kanpotiko corpusetatik es-kuratutako ezagutzarekin elikatu da. Ekarpen honekin kanpo-ezagutza honenkalitatea ebaluatu da sistemari egiten dion ekarpena justifikatuz. Gainera,eredu lokalaren emaitzak hobetzea lortu da, berriz ere arloaren egoerakobalioak eskuratuz.Tesia artikuluen bilduma gisa aurkeztuko da. Sarrera eta arloaren ego-era azaldu ondoren, tesiaren oinarri diren ingelesezko lau artikulu erantsikodira. Azkenik, lau artikuluetan jorratu diren gaiak biltzeko ondorio orokorrakplanteatuko dira