Search CORE

118 research outputs found

Extending the Inter-Lingual-Index with new concepts

Author: Bloksma L.
Peters W.
Vossen P.J.T.M.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/1999
Field of study

VU Research Portal

DWS 2006: Proceedings of the fourth international workshop on dictionary writing systems, Tuesday 5th September 2006, Turin, Italy (Pre-EURALEX 2006)

Author: de Schryver Gilles-Maurice
Publication venue: (SF)2 Press
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Overview and Future of Czech Wordnet

Author: Pala Karel
Rambousek Adam
Tukačová Sandra
Publication venue: CEUR-WS.org
Publication date: 01/01/2017
Field of study

Czech Wordnet represents one of the national wordnets created during the EuroWordNet and Balkanet projects. However, the data contains various issues that affects the use of Czech Wordnet in NLP applications. Due to lack of resources, it was not possible to update Czech Wordnet thoroughly since the publication of the first version. In 2017, we have started a project to evaluate and update Czech Wordnet, followed by the connection to Collaborative Interlingual Index. This paper provides overview of various updates and extensions of the Czech Wordnet data, and presents the roadmap to publish revised version of Czech Wordnet under open license.Český Wordnet je jeden z národních wordnetů, vytvořených během projektů EuroWordnet a Balkanet. Údaje ve wordnetu bohužel obsahují různé chyby, které ovlivňují použití českého wordnetu v NLP aplikacích. Vzhledem k nedostatečným zdrojům nebylo možno od vydání první verze český wordnet výrazně aktualizovat. V roce 2017 jsme začali pracovat na vyhodnocení a aktualizac českého wordnetu, následované napojením na Collaborative Interlingual Index. Tento článek shrnuje existující verze a rozšíření českého wordnetu a představuje plán na vydání aktualizované verze s otevřenou licencí

Univerzitní repozitář Masarykovy univerzity

D3.8 Lexical-semantic analytics for NLP

Author: Campagnano Cesare
Costa Rute
de Does Jesse
Dobrovoljc Kaja
Frontini Francesca
Gantar Polona
Kallas Jelena
Koppel Kristina
Krek Simon
Langemets Margit
Martelli Federico
Maru Marco
Munda Tina
Navigli Roberto
Nimb Sanni
Olsen Sussi
Quochi Valeria
Salgado Ana de Castro
Tempelaars Rob
Tiberius Carole
Ureña-Ruiz Rafael-J.
Velardi Paola
Čibej Jaka
Publication venue: ELEXIS - European Lexicographic Infrastructure
Publication date: 01/01/2022
Field of study

UIDB/03213/2020 UIDP/03213/2020The present document illustrates the work carried out in task 3.3 (work package 3) of ELEXIS project focused on lexical-semantic analytics for Natural Language Processing (NLP). This task aims at computing analytics for lexical-semantic information such as words, senses and domains in the available resources, investigating their role in NLP applications. Specifically, this task concentrates on three research directions, namely i) sense clustering, in which grouping senses based on their semantic similarity improves the performance of NLP tasks such as Word Sense Disambiguation (WSD), ii) domain labeling of text, in which the lexicographic resources made available by the ELEXIS project for research purposes allow better performances to be achieved, and finally iii) analysing the diachronic distribution of senses, for which a software package is made available.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Providing Multilinguality to Ontologies: An Overview

Author: Aguado de Cea G.
Montiel-Ponsoda Elena
Publication venue: Facultad de Informática (UPM)
Publication date: 01/04/2007
Field of study

Ontologies play a decisive role in the development of the Semantic Web, since they are able to model the knowledge of a specific domain in a machine readable way. However, the need to provide multilinguality to ontologies poses new challenges in the Ontology Engineering research. In this paper we attempt to offer an overview of available strategies for the localizing process of lexical resources and ontologies. Detailed steps in the localizing process of the multilingual lexicon EuroWordNet, the multilingual ontology GENOMA-KB, and the ontology translation software LabelTranslator are presented with the aim of illustrating three different localization approaches, their main characteristics and limitation

Archivo Digital UPM

Sustainable long-term WordNet development and maintenance: Case study of the Czech WordNet

Author: Adam Rambousek
Aleš Horák
Karel Pala
Publication venue: 'Institute of Slavic Studies Polish Academy of Sciences'
Publication date: 01/01/2018
Field of study

Sustainable long-term WordNet development and maintenance: Case study of the Czech WordNet Czech WordNet represents one of the first national wordnets created during the EuroWordNet and BalkaNet projects. However, the data contains various issues that affect the use of Czech WordNet in NLP applications. Since the publication of the first CzWN version, the semantic network was augmented in several phases, however, complex final editing and publishing process has not been finished. In 2017, we have started a project to evaluate and update the Czech WordNet, followed by a connection to the Collaborative Interlingual Index. In this paper, we provide an overview of Czech WordNet data updates and extensions, and present the roadmap to publish a revised version of the Czech WordNet under open license. Moreover, we introduce a developed concept for long-term updates and maintenance of the data based on crowdsourcing activities. Zrównoważony i długafalowy proces rozwoju i utrzymania wordnetu na przykładzie wordnetu czeskiego Czeski WordNet jest jednym z pierwszych narodowych wordnetów powstałych podczas projektów EuroWordNet i BalkaNet. Jednakże dane zawierają błędy, które wpływają na używanie czeskiego wordnetu w aplikacjach NLP. Od momentu opublikowania pierwszej wersji czeskiego wordnetu sieć semantyczna została rozszerzona w kilku etapach, jednak złożony proces końcowej edycji i publikacji nie został jeszcze zakończony. W roku 2017 zaczęliśmy projekt mający na celu ocenę i aktualizację czeskiego wordnetu, a następnie połączenie go z Collaborative Interlingual Index. W danym artykule przedstawiamy ogólny zarys uaktualnień i rozszerzeń zawartości czeskiego wordnetu, a także prezentujemy plan działania, który doprowadzi do publikacji udoskonalonej wersji czeskiego wordnetu na otwartej licencji. Ponadto prezentujemy opracowaną koncepcję długoterminowych uaktualnień i utrzymania danych w oparciu o działania crowdsourcingowe

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Directory of Open Access Journals

Mapping Text to Knowledge using Natural Language Processing

Author: Bodnari Andreea
Publication venue: Digital WPI
Publication date: 11/03/2010
Field of study

The goal of this project was to design and implement a system that analyzes text corpora. This system uses natural language processing techniques to extract knowledge from written text and represents this knowledge as a network. The system displays this network to the user and allows the user to interactively explore the network. The accuracy of the knowledge extraction process and the overall performance of the developed system were assessed. Possible applications are in social networks and text simplification

DigitalCommons@WPI

Persistent semantic identity in WordNet

Author: Eric Kafe
Publication venue: 'Institute of Slavic Studies Polish Academy of Sciences'
Publication date: 01/01/2018
Field of study

Persistent semantic identity in WordNet Although rarely studied, the persistence of semantic identity in the WordNet lexical database is crucial for the interoperability of all the resources that use WordNet data. The present study investigates the stability of the two primary entities of the WordNet database (the word senses and the synonym sets), by following their respective identifiers (the sense keys and the synset offsets) across all the versions released between 1995 and 2012, while also considering "drifts" of identical definitions and semantic relations. Contrary to expectations, 94.4% of the WordNet 1.5 synsets still persisted in the latest 2012 version, compared to only 89.1% of the corresponding sense keys. Meanwhile, the splits and merges between synonym sets remained few and simple. These results are presented in tables that allow to estimate the lexicographic effort needed for updating WordNet-based resources to newer WordNet versions. We discuss the specific challenges faced by both the dominant synset-based mapping paradigm (a moderate amount of split synsets), and the recommended sense key-based approach (very few identity violations), and conclude that stable synset identifiers are viable, but need to be complemented by stable sense keys in order to adequately handle the split synonym sets. Trwała tożsamość semantyczna w WordNecie Chociaż rzadko badana, trwałość tożsamości semantycznej w leksykalnej bazie danych WordNet ma kluczowe znaczenie dla interoperacyjności wszystkich zasobów korzystających z danych WordNetowych. W niniejszej pracy zbadano stabilność dwóch podstawowych elementów bazy danych WordNet (jednostek leksykalnych i synsetów – zbiorów synonimicznych jednostek leksykalnych), poprzez prześledzenie ich identyfikatorów (tj. identyfikatorów jednostek i identyfikatorów synsetów) we wszystkich wersjach wydanych w latach 1995-2012. Wzięto również pod uwagę przesunięcia identycznych definicji i relacji semantycznych. Wbrew oczekiwaniom, 94,4% synsetów WordNetu 1.5 zachowało się w najnowszej wersji z 2012 r., w porównaniu do 89,1% odpowiadających im identyfikatorów jednostek. Tymczasem podziały i połączenia pomiędzy synsetami pozostały proste i nieliczne. Wyniki te przedstawiono w tabelach, które pozwalają oszacować wysiłek leksykograficzny potrzebny do aktualizacji zasobów opartych o WordNet do nowszych wersji WordNetu. Omawiamy konkretne wyzwania, przed którymi stoi zarówno dominujący paradygmat rzutowania synsetów (umiarkowana liczba podzielonych synsetów), jak i zalecane podejście oparte na identyfikatorach jednostek (bardzo niewiele naruszeń tożsamości) i stwierdzamy, że można stworzyć stabilne identyfikatory synsetów, ale muszą one iść w parze ze stabilnymi identyfikatorami jednostek, aby odpowiednio zająć się podzielonymi synsetami

Biblioteka Nauki - repozytorium artykuÅÃ³w

Directory of Open Access Journals

Nodalida 2005 - proceedings of the 15th NODALIDA conference

Author
Publication venue: University of Joensuu
Publication date
Field of study

UEF Electronic Publications

Simple identification tools in FishBase

Author: Atanacio Rachek
Bailly Nicolas
Froese Rainer
Reyes Jr. Rodolfo
Publication venue: EUT - Edizioni Università di Trieste
Publication date: 01/01/2010
Field of study

Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

OceanRep

OpenstarTs