Search CORE

128 research outputs found

Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
McCrae John Philip
Montiel Elena
Publication venue
Publication date: 20/04/2023
Field of study

When linguistics meets web technologies. Recent advances in modelling linguistic linked data

Author: Chiarcos Christian
Declerck Thierry
García Elena González-Blanco
Gifu Daniela
Gracia Jorge
Ionov Maxim
Khan Anas Fahad
Labropoulou Penny
Mambrini Francesco (ORCID:0000-0003-0834-7562)
McCrae John P.
Muñoz Salvador Ros
Pagé-Perron Émilie
Passarotti Marco (ORCID:0000-0002-9806-7187)
Truică Ciprian-Octavian
Publication venue: 'IOS Press'
Publication date: 01/01/2022
Field of study

This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and ontologies) used for representing linguistic linked data (LLD). It focuses on the latest developments in the area and both builds upon and complements previous works covering similar territory. The article begins with an overview of recent trends which have had an impact on linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding of several major projects in which LLD is a key component, and the increasing importance of the relationship of the digital humanities with LLD. Next, we give an overview of some of the most well known vocabularies and models in LLD. After this we look at some of the latest developments in community standards and initiatives such as OntoLex-Lemon as well as recent work which has been in carried out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies META-SHARE and lime and language identifiers. In the following part of the paper we look at work which has been realised in a number of recent projects and which has a significant impact on LLD vocabularies and models

OPUS Augsburg

PubliCatt

Repositorio Universidad de Zaragoza

New functions and updates of the resource DiACL - Diachronic Atlas of Compartive Linguistics

Author: Carling Gerd
Larsson Filip
Lundgren Olof
Nilsson Linus
Verhoeven Rob
Publication venue: Pavia University Press
Publication date: 01/01/2021
Field of study

Lund University Publications

Proceedings of the 13th Linguistic Annotation Workshop, August 1, 2019, Florence, Italy

Author: Friedrich Annemarie
Hoek Jet
Zeyrek Deniz
Publication venue
Publication date: 07/07/2023
Field of study

OPUS Augsburg

Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

Author
Publication venue: 'OpenEdition'
Publication date: 15/12/2022
Field of study

The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

Directory of Open Access Books (DOAB)

An Ontology for CoNLL-RDF: Formal Data Structures for TSV Formats in Language Technology

Author: Chiarcos Christian
Glaser Luis
Ionov Maxim
Publication venue: OASIcs - OpenAccess Series in Informatics. 3rd Conference on Language, Data and Knowledge (LDK 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

An ontology for CoNLL-RDF: formal data structures for TSV formats in language technology

Author: Chiarcos Christian
Fäth Christian
Glaser Luis
Ionov Maxim
Publication venue
Publication date: 24/04/2023
Field of study

In language technology and language sciences, tab-separated values (TSV) represent a frequently used formalism to represent linguistically annotated natural language, often addressed as "CoNLL formats". A large number of such formats do exist, but although they share a number of common features, they are not interoperable, as different pieces of information are encoded differently in these dialects. CoNLL-RDF refers to a programming library and the associated data model that has been introduced to facilitate processing and transforming such TSV formats in a serialization-independent way. CoNLL-RDF represents CoNLL data, by means of RDF graphs and SPARQL update operations, but so far, without machine-readable semantics, with annotation properties created dynamically on the basis of a user-defined mapping from columns to labels. Current applications of CoNLL-RDF include linking between corpora and dictionaries [Mambrini and Passarotti, 2019] and knowledge graphs [Tamper et al., 2018], syntactic parsing of historical languages [Chiarcos et al., 2018; Chiarcos et al., 2018], the consolidation of syntactic and semantic annotations [Chiarcos and Fäth, 2019], a bridge between RDF corpora and a traditional corpus query language [Ionov et al., 2020], and language contact studies [Chiarcos et al., 2018]. We describe a novel extension of CoNLL-RDF, introducing a formal data model, formalized as an ontology. The ontology is a basis for linking RDF corpora with other Semantic Web resources, but more importantly, its application for transformation between different TSV formats is a major step for providing interoperability between CoNLL formats

OPUS Augsburg

Overview of the EvaLatin 2022 Evaluation Campaign

Author: Cecchini Flavio Massimiliano
Fantoli Margherita
Moretti Giovanni
Passarotti Marco
Sprugnoli Rachele
Publication venue: country:FRA
Publication date: 01/01/2022
Field of study

This paper describes the organization and the results of the second edition of EvaLatin, the campaign for the evaluation of Natural Language Processing tools for Latin. The three shared tasks proposed in EvaLatin 2022, i. e. Lemmatization, Part-of-Speech Tagging and Features Identification, are aimed to foster research in the field of language technologies for Classical languages. The shared dataset consists of texts mainly taken from the LASLA corpus. More specifically, the training set includes only prose texts of the Classical period, whereas the test set is organized in three sub-tasks: a Classical sub-task on a prose text of an author not included in the training data, a Cross-genre sub-task on poetic and scientific texts, and a Cross-time sub-task on a text of the 15th century. The results obtained by the participants for each task and sub-task are presented and discussed

Archivio istituzionale della Ricerca - Università degli Studi di Parma

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Building and Comparing Lemma Embeddings for Latin. Classical Latin versus Thomas Aquinas

Author: Moretti Giovanni
Passarotti Marco
Sprugnoli Rachele
Publication venue: 'OpenEdition'
Publication date: 08/04/2021
Field of study

This paper presents a new set of lemma embeddings for the Latin language. Embeddings are trained on a manually annotated corpus of texts belonging to the Classical era: different models, architectures and dimensions are tested and evaluated using a novel benchmark for the synonym selection task. In addition, we release vectors pre-trained on the “Opera Maiora” by Thomas Aquinas, thus providing a resource to analyze Latin in a diachronic perspective. The embeddings built upon the two training corpora are compared to each other to support diachronic lexical studies. The words showing the highest usage change between the two corpora are reported and a selection of them is discussed

OpenEdition

Issues in Building the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin

Author: Mambrini Francesco
Passarotti Marco
Publication venue
Publication date: 01/01/2022
Field of study

Purpose: This abstract presents the architecture and the current state of the LiLa Knowledge Base (https://lila-erc.eu), i.e., a collection of multifarious linguistic resources for Latin described with the same vocabulary of knowledge description, by using common data categories and ontologies developed by the Linguistic Linked Open Data (LLOD) community according to the principles of the Linked Data paradigm

Mykolas Romeris University Institutional Repository