Search CORE

12 research outputs found

Towards VocBench 3: Pushing collaborative development of thesauri and ontologies further beyond

Author: Costetchi E
Fiorelli M
Keizer J
Laaboudi C
Lorenzetti T
Stellato A
Turbati A
Van Gemert W
Publication venue: CEUR-WS
Publication date
Field of study

More than three years have passed since the release of the second edition of VocBench, an open source collaborative web platform for the development of thesauri complying with Semantic Web standards. In these years, a vibrant user community has gathered around the system, consisting of public organizations, companies and independent users looking for open source solutions for maintaining their thesauri, code lists and authority resources. The focus on collaboration, the differentiation of user roles and the workflow management for content validation and publication have been the strengths of the platform, especially for those organizations requiring a centralized and controlled publication environment. Now the time has come to widen the scope of the platform: funded by the ISA2programme of the European Commission, VocBench 3 will offer a general-purpose collaborative environment for development of any kind of RDF dataset, improving the editing capabilities of its predecessor, while still maintaining the peculiar aspects that determined its success. In this paper, we review the requirements and the new objectives set for version 3, and then introduce the new characteristics that were implemented for this next iteration of the platform

ART

A Lime-Flavored REST API for Alignment Services

Author: Armando Stellato
Manuel Fiorelli
Publication venue: European Language Resources Association
Publication date: 01/05/2020
Field of study

A practical alignment service should be flexible enough to handle the varied alignment scenarios that arise in the real world, while minimizing the need for manual configuration. MAPLE, an orchestration framework for ontology alignment, supports this goal by coordinating a few loosely coupled actors, which communicate and cooperate to solve a matching task using explicit metadata about the input ontologies, other available resources and the task itself. The alignment task is thus summarized by a report listing its characteristics and suggesting alignment strategies. The schema of the report is based on several metadata vocabularies, among which the Lime module of the OntoLex-Lemon model is particularly important, summarizing the lexical content of the input ontologies and describing external language resources that may be exploited for performing the alignment. In this paper, we propose a REST API that enables the participation of downstream alignment services in the process orchestrated by MAPLE, helping them self-adapt in order to handle heterogeneous alignment tasks and scenarios. The realization of this alignment orchestration effort has been performed through two main phases: we first described its API as an OpenAPI specification (a la API-first), which we then exploited to generate server stubs and compliant client libraries. Finally, we switched our focus to the integration of existing alignment systems, with one fully integrated system and an additional one being worked on, in the effort to propose the API as a valuable addendum to any system being developed

ART

AGROVOC: The linked data concept hub for food and agriculture

Author: Andrea Turbati
Armando Stellato
Daniel Martini
Esther Mietzsch
Imma Subirats-Coll
Kristin Kolshus
Marcia Zeng
Publication venue: 'Elsevier BV'
Publication date: 01/05/2022
Field of study

Newly acquired, aggregated and shared data are essential for innovation in food and agriculture to improve the discoverability of research. Since the early 1980′s, the Food and Agriculture Organization of the United Nations (FAO) has coordinated AGROVOC, a valuable tool for data to be classified homogeneously, facilitating interoperability and reuse. AGROVOC is a multilingual and controlled vocabulary designed to cover concepts and terminology under FAO's areas of interest. It is the largest Linked Open Data set about agriculture available for public use and its highest impact is through facilitating the access and visibility of data across domains and languages. This chapter has the aim of describing the current status of one of the most popular thesaurus in all FAO’s areas of interest, and how it has become the Linked Data Concept Hub for food and agriculture, through new procedures put in plac

ART

When linguistics meets web technologies. Recent advances in modelling linguistic linked data

Author: Chiarcos Christian
Declerck Thierry
García Elena González-Blanco
Gifu Daniela
Gracia Jorge
Ionov Maxim
Khan Anas Fahad
Labropoulou Penny
Mambrini Francesco (ORCID:0000-0003-0834-7562)
McCrae John P.
Muñoz Salvador Ros
Pagé-Perron Émilie
Passarotti Marco (ORCID:0000-0002-9806-7187)
Truică Ciprian-Octavian
Publication venue: 'IOS Press'
Publication date: 01/01/2022
Field of study

This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and ontologies) used for representing linguistic linked data (LLD). It focuses on the latest developments in the area and both builds upon and complements previous works covering similar territory. The article begins with an overview of recent trends which have had an impact on linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding of several major projects in which LLD is a key component, and the increasing importance of the relationship of the digital humanities with LLD. Next, we give an overview of some of the most well known vocabularies and models in LLD. After this we look at some of the latest developments in community standards and initiatives such as OntoLex-Lemon as well as recent work which has been in carried out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies META-SHARE and lime and language identifiers. In the following part of the paper we look at work which has been realised in a number of recent projects and which has a significant impact on LLD vocabularies and models

OPUS Augsburg

PubliCatt

Repositorio Universidad de Zaragoza

Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
McCrae John Philip
Montiel Elena
Publication venue
Publication date: 20/04/2023
Field of study

OPUS Augsburg

Principles and Applications of Data Science

Author
Publication venue: 'MDPI AG'
Publication date: 06/07/2022
Field of study

Data science is an emerging multidisciplinary field which lies at the intersection of computer science, statistics, and mathematics, with different applications and related to data mining, deep learning, and big data. This Special Issue on “Principles and Applications of Data Science” focuses on the latest developments in the theories, techniques, and applications of data science. The topics include data cleansing, data mining, machine learning, deep learning, and the applications of medical and healthcare, as well as social media

Directory of Open Access Books (DOAB)

Helping scientists integrate and interact with biomedical data

Author: Guerreiro Ana Rita Pereira
Publication venue
Publication date: 01/01/2021
Field of study

Tese de mestrado, Bioinformática e Biologia Computacional , 2021, Universidade de Lisboa, Faculdade de CiênciasFor the past decades, the amount and complexity of biomedical data available have increased and far exceeded the human capacity to process it. To support this, knowledge graphs and ontologies have been increasingly used, allowing semantic integration of heterogeneous data within and across domains. However, the independent development of biomedical ontologies has created heterogeneity problems, with the design of ontologies with overlapping domains or significant differences. Automated ontology alignment techniques have been developed to tackle the semantic heterogeneity problem, by establishing meaningful correspondences between entities of two ontologies. However, their performance is limited, and the alignments they produce can contain erroneous, incoherent, or missing mappings. Therefore, manual validation of automated ontology alignments remains essential to ensure their quality. Given the complexity of the ontology matching process, is important to provide visualization and a user interface with the necessary features to support the exploration, validation, and edition of alignments. However, these aspects are often overlooked, as few alignment systems feature user interfaces enabling alignment visualization, fewer allow editing alignments, and fewer provide the functionalities needed to make the task seamless for users. This dissertation developed VOWLMap — an extension for the standalone web application, WebVOWL — for visualizing, editing, and validating biomedical ontology alignments. This work extended the Visual Notation for OWL Ontologies (VOWL), which defines a visual representation for most language constructs of OWL, to support graphical representations of alignments and restructured WebVOWL to load and visualize alignments. VOWLMap employs modularization techniques to facilitate the visualization of large alignments, while maintaining the context of each mapping, and offers a dynamic visualization that supports interaction mechanisms, including direct interaction with and editing of graph representations. A user study was conducted to evaluate the usability and performance of VOWLMap, having obtained positive feedback with an excellent score in a standard usability questionnaire

Universidade de Lisboa: Repositório.UL

Terminological Methods in Lexicography: Conceptualising, Organising, and Encoding Terms in General Language Dictionaries

Author: Salgado Ana Maria de Castro Faria
Publication venue
Publication date: 04/04/2022
Field of study

Os dicionários de língua geral apresentam inconsistências de uniformização e cientificidade no tratamento do conteúdo lexicográfico especializado. Analisando a presença e o tratamento de termos em dicionários de língua geral, propomos um tratamento mais uniforme e cientificamente rigoroso desse conteúdo, considerando também a necessidade de compilar e alinhar futuros recursos lexicais em consonância com padrões interoperáveis. Partimos da premissa de que o tratamento dos itens lexicais, sejam unidades lexicais (palavras em geral) ou unidades terminológicas (termos ou palavras pertencentes a determinados domínios), deve ser diferenciado, e recorremos a métodos terminológicos para tratar os termos dicionarizados. A nossa abordagem assume que a terminologia – na sua dupla dimensão linguística e conceptual – e a lexicografia, como domínios interdisciplinares, podem ser complementares. Assim, apresentamos objetivos teóricos (aperfeiçoamento da metalinguagem e descrição lexicográfica a partir de pressupostos terminológicos) e práticos (representação consistente de dados lexicográficos), que visam facilitar a organização, descrição e modelização consistente de componentes lexicográficos, nomeadamente a hierarquização das etiquetas de domínio, que são marcadores de identificação de léxico especializados. Queremos ainda facilitar a redação de definições, as quais podem ser otimizadas e elaboradas com maior precisão científica ao seguir uma abordagem terminológica no tratamento dos termos. Analisámos os dicionários desenvolvidos por três instituições académicas distintas: a Academia das Ciências de Lisboa, a Real Academia Española e a Académie Française, que representam um valioso legado da tradição lexicográfica académica europeia. A análise inicial inclui um levantamento exaustivo e a comparação das etiquetas de domínio usadas, bem como um debate sobre as opções escolhidas e um estudo comparativo do tratamento dos termos. Elaborámos, depois, uma proposta metodológica para o tratamento de termos em dicionários de língua geral, tomando como exemplo dois domínios, GEOLOGIA e FUTEBOL, extraídos da edição de 2001 do dicionário da Academia das Ciências de Lisboa. Revimos os termos selecionados de acordo com os princípios terminológicos defendidos, dando origem a sentidos especializados revistos/novos para a primeira edição digital deste dicionário. Representamos e anotamos os dados usando as especificações da TEI Lex-0, uma extensão da TEI (Text Encoding Initiative), dedicada à codificação de dados lexicográficos. Destacamos também a importância de ter etiquetas de domínio hierárquicas em vez de uma lista simples de domínios, vantajosas para a organização dos dados, correspondência e possíveis futuros alinhamentos entre diferentes recursos lexicográficos. A investigação revelou que a) os modelos estruturais dos recursos lexicais são complexos e contêm informação de natureza diversa; b) as etiquetas de domínio nos dicionários gerais da língua são planas, desequilibradas, inconsistentes e, muitas vezes, estão desatualizadas, havendo necessidade de as hierarquizar para organizar o conhecimento especializado; c) os critérios adotados para a marcação dos termos e as fórmulas utilizadas na definição são díspares; d) o tratamento dos termos é heterogéneo e formulado de diferentes formas, pelo que o recurso a métodos terminológicos podem ajudar os lexicógrafos a redigir definições; e) a aplicação de métodos terminológicos e lexicográficos interdisciplinares, e também de padrões, é vantajosa porque permite a construção de bases de dados lexicais estruturadas, concetualmente organizadas, apuradas do ponto de vista linguístico e interoperáveis. Em suma, procuramos contribuir para a questão urgente de resolver problemas que afetam a partilha, o alinhamento e vinculação de dados lexicográficos.General language dictionaries show inconsistencies in terms of uniformity and scientificity in the treatment of specialised lexicographic content. By analysing the presence and treatment of terms in general language dictionaries, we propose a more uniform and scientifically rigorous treatment of this content, considering the necessity of compiling and aligning future lexical resources according to interoperable standards. We begin from the premise that the treatment of lexical items, whether lexical units (words in general) or terminological units (terms or words belonging to particular subject fields), must be differentiated, and resort to terminological methods to treat dictionary terms. Our approach assumes that terminology – in its dual dimension, both linguistic and conceptual – and lexicography, as interdisciplinary domains, can be complementary. Thus, we present theoretical (improvement of metalanguage and lexicographic description based on terminological assumptions) and practical (consistent representation of lexicographic data) objectives that aim to facilitate the organisation, description and consistent modelling of lexicographic components, namely the hierarchy of domain labels, as they are specialised lexicon identification markers. We also want to facilitate the drafting of definitions, which can be optimised and elaborated with greater scientific precision by following a terminological approach for the treatment of terms. We analysed the dictionaries developed by three different academic institutions: the Academia das Ciências de Lisboa, the Real Academia Española and the Académie Française, which represent a valuable legacy of the European academic lexicographic tradition. The initial analysis includes an exhaustive survey and comparison of the domain labels used, as well as a debate on the chosen options and a comparative study of the treatment of the terms. We then developed a methodological proposal for the treatment of terms in general language dictionaries, exemplified using terms from two domains, GEOLOGY and FOOTBALL, taken from the 2001 edition of the dictionary of the Academia das Ciências de Lisboa. We revised the selected terms according to the defended terminological principles, giving rise to revised/new specialised meanings for the first digital edition of this dictionary. We represent and annotate the data using the TEI Lex-0 specifications, a TEI (Text Encoding Initiative) subset for encoding lexicographic data. We also highlight the importance of having hierarchical domain labels instead of a simple list of domains, which are beneficial to the data organisation itself, correspondence and possible future alignments between different lexicographic resources. Our investigation revealed the following: a) structural models of lexical resources are complex and contain information of a different nature; b) domain labels in general language dictionaries are flat, unbalanced, inconsistent and often outdated, requiring the need to hierarchise them for organising specialised knowledge; c) the criteria adopted for marking terms and the formulae used in the definition are disparate; d) the treatment of terms is heterogeneous and formulated differently, whereby terminological methods can help lexicographers to draft definitions; e) the application of interdisciplinary terminological and lexicographic methods, and of standards, is advantageous because it allows the construction of structured, conceptually organised, linguistically accurate and interoperable lexical databases. In short, we seek to contribute to the urgent issue of solving problems that affect the sharing, alignment and linking of lexicographic data

Repositório da Universidade Nova de Lisboa

Assessing VocBench custom forms in supporting editing of lemon datasets

Author: Fiorelli M
Lorenzetti T
Pazienza Mt
Stellato A
Publication venue: Springer Verlag
Publication date: 01/01/2017
Field of study

The lexicon model for ontologies OntoLex/lemon has been released in May, 2016, following more than 2 years of work of the Ontology-Lexicon (OntoLex) W3C Community Group. Lemon provides rich linguistic grounding for ontologies, including the representation of morphological and syntactic properties of lexical entries as well as the syntax-semantics interface. The rich expressivity of lemon requires however non-trivial modeling, with complex patterns characterized by indirections and reifications, indeed very difficult to handle by general-purpose ontology editing tools providing triple-grained manipulation. Extending such tools with lemon-tailored editing primitives would enable agile editing of lexicons and ontology-lexicon interfaces, while still benefiting from the wider modeling spectrum provided by RDF. In this paper, we assess the potential of VocBench Custom Forms, a flexible data-driven form definition mechanism being developed for the VocBench 3 collaborative editing platform, by evaluating their ability to assist the creation of lemon entities, disburdening the user from low-level modeling details and letting them focus on the content being edited

ART

AIUCD 2021 - Book of Extended Abstracts

Author
Publication venue
Publication date: 23/06/2021
Field of study

Il decimo convegno annuale dell'Associazione per l’Informatica Umanistica e la Cultura Digitale ha nell’edizione 2021 un titolo peculiare e importante: "DH per la società: e-guaglianza, partecipazione, diritti e valori nell’era digitale". Questo volume raccoglie gli abstract estesi e sottoposti a review per la conferenza di AIUCD2021 tenutasi in forma virtuale a Pisa

AMS Acta