Search CORE

68 research outputs found

many faces, many places (Term21)

Author: Carvalho Sara
Costa Rute
Khan Anas Fahad
Ostroski Anic Ana
Publication venue: ELRA
Publication date: 01/01/2022
Field of study

UIDB/03213/2020 UIDP/03213/2020publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Leveraging a Narrative Ontology to Query a Literary Text

Author: Bellandi Andrea
Benotto Giulia
Frontini Francesca
Giovannetti Emiliano
Khan Anas Fahad
Reboul Marianne
Publication venue: OASIcs - OpenAccess Series in Informatics. 7th Workshop on Computational Models of Narrative (CMN 2016)
Publication date: 01/01/2016
Field of study

In this work we propose a model for the representation of the narrative of a literary text. The model is structured in an ontology and a lexicon constituting a knowledge base that can be queried by a system. This narrative ontology, as well as describing the actors, locations, situations found in the text, provides an explicit formal representation of the timeline of the story. We will focus on a specific case study, that of the representation of a selected portion of Homer\u27s Odyssey, in particular of the knowledge required to answer a selection of salient queries, formulated by a literary scholar. This work is being carried out within the framework of the Semantic Web by adopting models and standards such as RDF, OWL, SPARQL, and lemon among others

Dagstuhl Research Online Publication Server

Modelling frequency and attestations for OntoLex-Lemon

Author: Chiarcos Christian
de Does Jesse
Declerck Thierry
Depuydt Katrien
Fahad Khan Anas
Ionov Maxim
McCrae John Philip
Stolk Sander
Publication venue
Publication date: 24/04/2023
Field of study

The OntoLex vocabulary enjoys increasing popularity as a means of publishing lexical resources with RDF and as Linked Data. The recent publication of a new OntoLex module for lexicography, lexicog, reflects its increasing importance for digital lexicography. However, not all aspects of digital lexicography have been covered to the same extent. In particular, supplementary information drawn from corpora such as frequency information, links to attestations, and collocation data were considered to be beyond the scope of lexicog. Therefore, the OntoLex community has put forward the proposal for a novel module for frequency, attestation and corpus information (FrAC), that not only covers the requirements of digital lexicography, but also accommodates essential data structures for lexical information in natural language processing. This paper introduces the current state of the OntoLex-FrAC vocabulary, describes its structure, some selected use cases, elementary concepts and fundamental definitions, with a focus on frequency and attestations

OPUS Augsburg

A Survey of Guidelines and Best Practices for the Generation, Interlinking, Publication, and Validation of Linguistic Linked Data

Author: Anas Fahad Khan
Christian Chiarcos
Daniela Gifu
di Buono Maria Pia
Giedre Valunaite Oleskeviciene
Jorge Gracia
Milan Dojchinovski
Thierry Declerck
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2022
Field of study

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Portuguese Borrowings in Contemporary Asian Languages

Author: Anuradha Isuri
Costa Rute
Frontini Francesca
Khan Anas Fahad
Liyange Chamila
McCrae John P.
Ojha Atul Kr.
Rani Priya
Salgado Ana Castro
Publication venue: Institut for the Croatian Language
Publication date: 01/01/2024
Field of study

UIDB/03213/2020 UIDP/03213/2020CHAMUÇA (Cultural HeritAge and Multilingual Understanding through lexiCal Archives) is a pioneering initiative aimed at exploring the impact of the Portuguese language on Asian languages, rooted in the historical exchanges between Portuguese traders, colonists, and diverse Asian cultures. The impact of these interactions extends beyond historical remnants to the modern-day lexicon of Asian languages, which includes a diverse array of Portuguese borrowings, ranging from general vocabulary units to specialised units. We aim to detail the initiative’s current status, its goals, and the methodology it employs. Additionally, it will outline the essential steps required for organising and structuring the knowledge embedded within and associated with the borrowings. CHAMUÇA, an innovative open-source resource designed to document and study these Portuguese linguistic contributions, will augment the pool of structured lexical data and support cross-linguistic analysis, using state-of-the-art frameworks such as OntoLex-Lemon and TEI Lex-0 to structure the lexical data. Following FAIR principles – ensuring data is fndable, accessible, interoperable, and reusable – CHAMUÇA is poised to contribute to linguistic borrowings, cultural interchange, and the preservation of linguistic heritage. Furthermore, the project will encourage community involvement and scholarly collaboration to evolve and enrich its contents, leveraging collective expertise to illuminate the nuances of language contact phenomena.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

OntoLex-Morph: Morphology for the Web of Data

Author: Chiarcos Christian
Gkirtzou Katerina
Ionov Maxim
Khan Anas Fahad
Labropoulou Penny
Passarotti Marco
Pellegrini Matteo
Publication venue
Publication date: 01/01/2022
Field of study

Purpose: OntoLex-Lemon is a widely used community standard for publishing lexical resources in machine-readable form, and is in fact the predominant RDF vocabulary for this purpose. With the growing popularity and increasing adoption of this model for applications in both language technology and lexicography, a number of new modules have been developed in the past year to complement the OntoLex core vocabulary and its lexicographic follow up, lexicog. In this paper, we describe the current status of the development of the OntoLex-Morph vocabulary

Mykolas Romeris University Institutional Repository

Following Best Practices in a Retro-digitized Dictionary Project

Author: Almeida Bruno
Carvalho Sara
Costa Rute
Khan Anas Fahad
Khemakhem Mohamed
Lehečka Boris
Ramos Margarida
Romary Laurent
Salgado Ana Castro
Silva Raquel
Tasovac Toma
Publication venue
Publication date: 01/01/2024
Field of study

UIDB/03213/2020 UIDP/03213/2020 PTDC/LLT-LIN/6841/2020This article outlines essential best practices for retro-digitized dictionary projects, using the ongoing MORDigital project (DOI 10.54499/PTDC/LLT-LIN/6841/2020) as a case study. The MORDigital project focuses on digitally transforming the historically significant Portuguese Morais dictionary’s first three editions (1789, 1813, 1823). While the primary objective is to create faithful digital versions of these renowned dictionaries, MORDigital stands out by going beyond the mere adoption of established best practices. Instead, it reflects on the choices made throughout the process, providing insights into the decision-making process. The key topics emphasized include (1) the establishment of a robust data model; (2) the refinement of metadata; (3) the implementation of consistent identifiers; and (4) the enhancement of encoding techniques; additionally exploring the issue of structuring domain labelling. The article aims to contribute to the ongoing discourse on best practices in retro-digitized dictionary projects and their implications for data preservation and knowledge organization.publishersversionpublishe

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Repositório da Universidade Nova de Lisboa

Historiae, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case

Author: Apostol Elena-Simona
Armaselu Florentina
Fahad Khan Anas
Liebeskind Chaya
McGillivray Barbara
Truică Ciprian-Octavian
Valūnaitė Oleškevičienė Giedrė
Publication venue
Publication date: 01/01/2021
Field of study

The paper proposes an interdisciplinary approach including methods from disciplines such as history of concepts, linguistics, natural language processing (NLP) and Semantic Web, to create a comparative framework for detecting semantic change in multilingual historical corpora and generating diachronic ontologies as linguistic linked open data (LLOD). Initiated as a use case (UC4.2.1) within the COST Action Nexus Linguarum, European network for Web-centred linguistic data science, the study will explore emerging trends in knowledge extraction, analysis and representation from linguistic data science, and apply the devised methodology to datasets in the humanities to trace the evolution of concepts from the domain of socio-cultural transformation. The paper will describe the main elements of the methodological framework and preliminary planning of the intended workflow

Mykolas Romeris University Institutional Repository

Tracing Semantic Change with Multilingual LLOD and Diachronic Word Embeddings

Author: Apostol Elena-Simona
Armaselu Florentina
Chiarcos Christian
Khan Anas Fahad
Liebeskind Chaya
McGillivray Barbara
Truică Ciprian-Octavian
Valūnaitė-Oleškevičienė Giedrė
Publication venue
Publication date: 01/01/2022
Field of study

Purpose: The project will combine word embedding techniques and linguistic linked open data (LLOD) with theoretical aspects from lexical semantics, the history of concepts, and knowledge organization to trace the evolution of concepts in a collection of multilingual diachronic corpora of seven extinct and extant languages (Latin, Ancient Greek, Hebrew, French, Old Lithuanian, Romanian, German). The outcome will consist of a sample of diachronic ontologies to be published on the LLOD cloud. It will also comprise reflections on the potential interconnections across different languages that can be built through these knowledge structures

Mykolas Romeris University Institutional Repository

Interlinking Lexicographic Data in the MORDigital Project

Author: Almeida Bruno
Carvalho Sara
Costa Rute
Khan Anas Fahad
Khemakhem Mohamed
Ramos Margarida
Romary Laurent
Salgado Ana
Silva Raquel
Tasovac Toma
Publication venue
Publication date: 01/01/2022
Field of study

Purpose: To introduce MORDigital as an innovative Portuguese national project that incorporates the latest results in computational lexicography, the digital humanities, and linguistic linked data. In particular, we will show how it brings together work in the development of TEI Lex-0 and OntoLex-Lemon, as well as recent innovations on the conversion of retrodigitized dictionaries into computational lexical resources (using in this case the GROBID-dictionaries tool)

INRIA a CCSD electronic archive server

Repositório da Universidade Nova de Lisboa

Mykolas Romeris University Institutional Repository