1,218 research outputs found

    Grouping Synonyms by Definitions

    Get PDF
    We present a method for grouping the synonyms of a lemma according to its dictionary senses. The senses are defined by a large machine readable dictionary for French, the TLFi (Tr\'esor de la langue fran\c{c}aise informatis\'e) and the synonyms are given by 5 synonym dictionaries (also for French). To evaluate the proposed method, we manually constructed a gold standard where for each (word, definition) pair and given the set of synonyms defined for that word by the 5 synonym dictionaries, 4 lexicographers specified the set of synonyms they judge adequate. While inter-annotator agreement ranges on that task from 67% to at best 88% depending on the annotator pair and on the synonym dictionary being considered, the automatic procedure we propose scores a precision of 67% and a recall of 71%. The proposed method is compared with related work namely, word sense disambiguation, synonym lexicon acquisition and WordNet construction

    Using WordNet for Building WordNets

    Full text link
    This paper summarises a set of methodologies and techniques for the fast construction of multilingual WordNets. The English WordNet is used in this approach as a backbone for Catalan and Spanish WordNets and as a lexical knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL

    Fighting with the Sparsity of Synonymy Dictionaries

    Full text link
    Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph. However, such methods are sensitive to the structure of the input synonymy graph: sparseness of the input dictionary can substantially reduce the quality of the extracted synsets. In this paper, we propose two different approaches designed to alleviate the incompleteness of the input dictionaries. The first one performs a pre-processing of the graph by adding missing edges, while the second one performs a post-processing by merging similar synset clusters. We evaluate these approaches on two datasets for the Russian language and discuss their impact on the performance of synset induction methods. Finally, we perform an extensive error analysis of each approach and discuss prominent alternative methods for coping with the problem of the sparsity of the synonymy dictionaries.Comment: In Proceedings of the 6th Conference on Analysis of Images, Social Networks, and Texts (AIST'2017): Springer Lecture Notes in Computer Science (LNCS

    Data-driven Synset Induction and Disambiguation for Wordnet Development

    Get PDF
    International audienceAutomatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and transferring its content in new languages using alignments, possibly combined with information extracted from multilingual semantic resources. Even if the role of PWN remains central in this process, these automatic methods offer an alternative to the manual elaboration of new wordnets. However, their limited coverage has a strong impact on that of the resulting resources. Following this line of research, we apply a cross-lingual word sense disambiguation method to wordnet development. Our approach exploits the output of a data-driven sense induction method that generates sense clusters in new languages, similar to wordnet synsets, by identifying word senses and relations in parallel corpora. We apply our cross-lingual word sense disambiguation method to the task of enriching a French wordnet resource, the WOLF, and show how it can be efficiently used for increasing its coverage. Although our experiments involve the English-French language pair, the proposed methodology is general enough to be applied to the development of wordnet resources in other languages for which parallel corpora are available. Finally, we show how the disambiguation output can serve to reduce the granularity of new wordnets and the degree of polysemy present in PWN

    Is it possible to enrich ontologies with a specialized domain linguistic resource?

    Get PDF
    Enriching ontologies with linguistic resources is considered an important target in natural language applications. These linguistic resources should contain not only linguistic but knowledge information. However the linguistic resources available, such as WordNet, are built around lexical relations such as synonymy, antonym, hyponymy, etc. and they do not provide enough information for ontology building. On the other hand, ontologies building requires deeper and more accurate knowledge than general vocabulary contains and, consequently, demands specialized domain resources. This paper presents a linguistic resource developed for Spanish, that has been built followingsome Meaning-Text Theory principles, in order to contain as much possible knowledge related to several specialized domains

    ICONCLASS - Klasifikacijski sustav za umjetnost i ikonografiju

    Get PDF
    Documenting is a crucial activity for any museum or art institution. Today, that importance is growing for the metadata museum provides us with, is essential in retrieving information in the vast amount of data of the modern world. The goal of this study is to discuss the design of thesauri, how they work and what is their purpose in documenting museum objects. It further discusses content indexing together with aboutness, isness and ofness, to draw a parallel with Panofsky’s categories in iconography. The central focus of the work falls onto analyzing Iconclass, its features, and usage. Additionally, it concentrates on new developments in machine learning within artificial intelligence, which use Iconclass to generate and automatize new data and connections. Finally, it gives a brief overview of folksonomy and social tagging.Dokumentiranje je ključna aktivnost svakog muzeja ili umjetničke institucije. Danas ta važnost raste jer metapodaci koje nam muzej pruža igraju bitnu ulogu u pronalaženju informacija u ogromnoj količini podataka suvremenog svijeta. Cilj ovog rada je predstaviti i raspravljati o dizajnu tezaurusa, kako oni rade i koja je njihova svrha u dokumentiranju muzejskih objekata. Nadalje se takodjer predstavlja sadržajnu obradu zajedno s sustinom, postojanoscu i svojstvom (aboutness, isness, ofness) kako bi se usporedila s Panofskijevim kategorijama u ikonografiji. Središnji fokus rada je analiziranje Iconclass-a, njegovih značajki i upotrebe. Osim toga, rad se usredotočuje na nove razvoje u strojnom učenju preko umjetne inteligencije, koji koriste Iconclass za generiranje i automatizaciju novih podataka i veza. Na kraju, daje se kratak pregled folksonomije i socijalnog označavanja

    NorNet — a monolingual wordnet of modern Norwegian

    Get PDF
    Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. Editors: Bolette Sandford Pedersen, Anna Braasch, Sanni Nimb and Ruth Vatvedt Fjeld. NEALT Proceedings Series, Vol. 7 (2009), 13-16. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9209

    Extracting Synonyms from Bilingual Dictionaries

    Full text link
    We present our progress in developing a novel algorithm to extract synonyms from bilingual dictionaries. Identification and usage of synonyms play a significant role in improving the performance of information access applications. The idea is to construct a translation graph from translation pairs, then to extract and consolidate cyclic paths to form bilingual sets of synonyms. The initial evaluation of this algorithm illustrates promising results in extracting Arabic-English bilingual synonyms. In the evaluation, we first converted the synsets in the Arabic WordNet into translation pairs (i.e., losing word-sense memberships). Next, we applied our algorithm to rebuild these synsets. We compared the original and extracted synsets obtaining an F-Measure of 82.3% and 82.1% for Arabic and English synsets extraction, respectively.Comment: In Proceedings - 11th International Global Wordnet Conference (GWC2021). Global Wordnet Association (2021
    corecore