Search CORE

4 research outputs found

Building a Lexico-Semantic Resource Collaboratively

Author: Giunchiglia Fausto
Leonardi Natascia
Mercedes Huertas-Miguela\u301n\u303ez
Publication venue: place:Ljubljana
Publication date: 01/01/2018
Field of study

Multilingual lexico-semantic resources are used in different semantic services such as meaning extraction or data integration and linking, which are essential for the development of real-world applications. However their use is hampered by the lack of maintenance and quality control mechanisms over their content. The Universal Knowledge Core (UKC) is a multilingual lexico-semantic resource designed as a multi-layered ontology that has a language-independent semantic layer, the concept core, and a language-specific lexico-semantic layer, the natural language core. In this paper, we focus on expert-based, collaborative workflow for building and maintaining our resource through lexicalisation and evaluation of language elements via a dedicated User Interface (UI). We have run a three-month study to analyse the feasibility of the proposed solution. We interviewed participants to obtain a comprehensive vision with respect to different aspects related to the way they interacted with the UI and how the content presented through it was perceived. We concluded that this collaborative experience fostered not only the implementation of a resource but also an improvement of its functionalities, and, above all, it represented an example of effective knowledge sharing which opened up the way to a network of collaborative intelligence

Archivio istituzionale della ricerca - Università di Macerata

Lexical Diversity in Kinship Across Languages and Dialects

Author: Bella Gábor
Darma Shandy
Freihat Abed Alhakim
Giunchiglia Fausto
Khalilia Hadi
Publication venue
Publication date: 24/08/2023
Field of study

Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities

arXiv.org e-Print Archive

Frontiers in African Digital Research : Conference Proceedings

Author: Dreiser Anja
Samimi Cyrus
Publication venue
Publication date: 29/06/2022
Field of study

EPub Bayreuth

Understanding and Exploiting Language Diversity

Author: Batsuren Khuyagbaatar
Publication venue: University of Trento
Publication date: 03/12/2018
Field of study

Languages are well known to be diverse on all structural levels, from the smallest (phonemic) to the broadest (pragmatic). We propose a set of formal, quantitative measures for the language diversity of linguistic phenomena, the resource incompleteness, and resource incorrectness. We apply all these measures to lexical semantics where we show how evidence of a high degree of universality within a given language set can be used to extend lexico-semantic resources in a precise, diversity-aware manner. We demonstrate our approach on several case studies: First is on polysemes and homographs among cases of lexical ambiguity. Contrarily to past research that focused solely on exploiting systematic polysemy, the notion of universality provides us with an automated method also capable of predicting irregular polysemes. Second is to automatically identify cognates from the existing lexical resource across different orthographies of genetically unrelated languages. Contrarily to past research that focused on detecting cognates from 225 concepts of Swadesh list, we captured 3.1 million cognate pairs across 40 different orthographies and 335 languages by exploiting the existing wordnet-like lexical resources

Unitn-eprints PhD