Search CORE

111 research outputs found

Benchmark datasets for biomedical knowledge graphs with negative statements

Author: Pesquita Catia
Silva Sara
Sousa Rita T.
Publication venue
Publication date: 21/07/2023
Field of study

Knowledge graphs represent facts about real-world entities. Most of these facts are defined as positive statements. The negative statements are scarce but highly relevant under the open-world assumption. Furthermore, they have been demonstrated to improve the performance of several applications, namely in the biomedical domain. However, no benchmark dataset supports the evaluation of the methods that consider these negative statements. We present a collection of datasets for three relation prediction tasks - protein-protein interaction prediction, gene-disease association prediction and disease prediction - that aim at circumventing the difficulties in building benchmarks for knowledge graphs with negative statements. These datasets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, enriched with negative statements. We also generate knowledge graph embeddings for each dataset with two popular path-based methods and evaluate the performance in each task. The results show that the negative statements can improve the performance of knowledge graph embeddings

arXiv.org e-Print Archive

Explainable Representations for Relation Prediction in Knowledge Graphs

Author: Pesquita Catia
Silva Sara
Sousa Rita T.
Publication venue
Publication date: 22/06/2023
Field of study

Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

Ontology Matching Techniques for Enterprise Architecture Models

Author: Catia Pesquita
José Borbinha
Marzieh Bakhshandeh
Publication venue
Publication date: 11/04/2020
Field of study

Abstract. Current Enterprise Architecture (EA) approaches tend to be generic, based on broad meta-models that cross-cut distinct architectural domains. Integrating these models is necessary to an effective EA process, in order to support, for example, benchmarking of business processes or assessing compliance to structured requirements. However, the integration of EA models faces challenges stemming from structural and semantic heterogeneities that could be addressed by ontology matching techniques. For that, we used AgreementMakerLight, an ontology matching system, to evaluate a set of state of the art matching approaches that could adequately address some of the heterogeneity issues. We assessed the matching of EA models based on the ArchiMate and BPMN languages, which made possible to conclude about not only the potential but also of the limitations of these techniques to properly explore the more complex semantics present in these models. Enterprise Architecture (EA) is a practice to support the analysis, design and implementation of a business strategy in an organization, considering its relevant multiple domains. In recent years, a variety of Enterprise Architecture To support the matching tasks we have used AgreementMakerLight (AML

CiteSeerX

The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources

Author: Catia Pesquita
Francisco M Couto
João D Ferreira
Mário J Silva
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: Epidemiology is a data-intensive and multi-disciplinary subject, where data integration, curation and sharing are becoming increasingly relevant, given its global context and time constraints. The semantic annotation of epidemiology resources is a cornerstone to effectively support such activities. Although several ontologies cover some of the subdomains of epidemiology, we identified a lack of semantic resources for epidemiology-specific terms. This paper addresses this need by proposing the Epidemiology Ontology (EPO) and by describing its integration with other related ontologies into a semantic enabled platform for sharing epidemiology resources. RESULTS: The EPO follows the OBO Foundry guidelines and uses the Basic Formal Ontology (BFO) as an upper ontology. The first version of EPO models several epidemiology and demography parameters as well as transmission of infection processes, participants and related procedures. It currently has nearly 200 classes and is designed to support the semantic annotation of epidemiology resources and data integration, as well as information retrieval and knowledge discovery activities. CONCLUSIONS: EPO is under active development and is freely available at https://code.google.com/p/epidemiology-ontology/. We believe that the annotation of epidemiology resources with EPO will help researchers to gain a better understanding of global epidemiological events by enhancing data integration and sharing

Springer - Publisher Connector

PubMed Central

Special issue on ontology and linked data matching

Author: Cheatham Michelle
Cruz Isabel
Euzenat Jérôme
Pesquita Catia
Publication venue: 'IOS Press'
Publication date: 06/12/2016
Field of study

cheatham2017bEditorial, Semantic web journal 8(2):183-18

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

DDB-EDM to FaBiO: The Case of the German Digital Library

Author: Dessı̀ Danilo
Oppenländer Jonas
Oshani Seneviratne Juan Sequeda, Lorena Etcheverry, Catia Pesquita
Sack Harald
Tan Mary Ann
Tietz Tabea
Publication venue: RWTH Aachen
Publication date: 06/11/2021
Field of study

Cultural heritage portals have the goal of providing users with seamless access to all their resources. This paper introduces initial efforts for a user-oriented restructuring of the German Digital Library (DDB). At present, cultural heritage objects (CHOs) in the DDB are modeled using an extended version of the Europeana Data Model (DDBEDM), which negatively impacts usability and exploration. These challenges can be addressed by exploiting ontologies, and building a knowledge graph from the DDB’s voluminous collection. Towards this goal, an alignment of bibliographic metadata from DDB-EDM to FRBR-Aligned Bibliographic Ontology (FaBiO) is presented

KITopen

QuoteKG: A Multilingual Knowledge Graph of Quotes

Author: Demidova Elena
Gottschalk Simon
Groth Paul
Kapanipathi Pavan
Kuculo Tin
Pesquita Catia
Skaf-Molli Hala
Suchanek Fabian
Szekley Pedro
Tamper Minna
Vidal Maria-Esther
Publication venue: Cham, Switzerland : Springer
Publication date: 01/01/2022
Field of study

Quotes of public figures can mark turning points in history. A quote can explain its originator’s actions, foreshadowing political or personal decisions and revealing character traits. Impactful quotes cross language barriers and influence the general population’s reaction to specific stances, always facing the risk of being misattributed or taken out of context. The provision of a cross-lingual knowledge graph of quotes that establishes the authenticity of quotes and their contexts is of great importance to allow the exploration of the lives of important people as well as topics from the perspective of what was actually said. In this paper, we present QuoteKG, the first multilingual knowledge graph of quotes. We propose the QuoteKG creation pipeline that extracts quotes from Wikiquote, a free and collaboratively created collection of quotes in many languages, and aligns different mentions of the same quote. QuoteKG includes nearly one million quotes in 55 languages, said by more than 69, 000 people of public interest across a wide range of topics. QuoteKG is publicly available and can be accessed via a SPARQL endpoint

Institutionelles Repositorium der Leibniz Universität Hannover

Towards Certified Distributed Query Processing

Author: Aebeloe Christian
Alam Mehwish
Aras Hidir
Azzam Amr
Cano Juan
Domingue John
Gottschalk Simon
Hertling Sven
Pesquita Catia
Rohde Philipp
Rohde Philipp D.
Trojahn Cassia
Vidal Maria-Esther
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 01/01/2023
Field of study

In recent years, knowledge graphs (KGs) have gained more and more importance. As a consequence of that, the number of publicly accessible KGs is increasing. Due to their adoption in many areas, KGs are used in numerous different applications. However, these knowledge graph applications are not developed by the data owners and they might collect data from several linked KGs. It is therefore essential that systems accessing KGs are certified, i.e., each component is certified for a specific use by an entity or agency. In addition, a trace of the performed operations and used data is needed in order to verify that all requirements were met, e.g., some data cannot be transferred from the source to any other component due to privacy restrictions. This work describes the vision of certified distributed querying in the context of an analytics platform. Challenges for such systems are identified and discussed. © 2023 Copyright for this paper by its authors

Institutionelles Repositorium der Leibniz Universität Hannover

Results of the Ontology Alignment Evaluation Initiative 2015

Author: Cheatham Michelle
Dragisic Zlatan
Euzenat Jérôme
Faria Daniel
Ferrara Alfio
Flouris Giorgos
Fundulaki Irini
Granada Roger
Ivanova Valentina
Jiménez-Ruiz Ernesto
Lambrix Patrick
Montanelli Stefano
Pesquita Catia
Saveta Tzanina
Shvaiko Pavel
Solimando Alessandro
Trojahn dos Santos Cassia
Zamazal Ondrej
Publication venue: No commercial editor.
Publication date: 01/01/2015
Field of study

cheatham2016aInternational audienceOntology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation and consensus. OAEI 2015 offered 8 tracks with 15 test cases followed by 22 participants. Since 2011, the campaign has been using a new evaluation modality which provides more automation to the evaluation. This paper is an overall presentation of the OAEI 2015 campaign

HAL-CentraleSupelec

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Open Archive Toulouse Archive Ouverte

Hal-Diderot

HAL-Rennes 1

Metrics for GO based protein semantic similarity: a systematic evaluation

Author: A Schlicker
A Valencia
André O Falcão
António EN Ferreira
C Pesquita
C Wu
Catia Pesquita
D Devos
D Devos
D Faria
D Lin
Daniel Faria
E Camon
EB Camon
F Azuaje
F Azuaje
F Couto
F Couto
FM Couto
Francisco M Couto
Gentleman
Hugo Bastos
J Chabalier
J Jiang
J Tuikkala
JL Sevilla
L Stein
P Lord
P Lord
P Resnik
PH Lee
RM Othman
RM Riensche
S Cao
T Joshi
X Guo
X Wu
Y Tao
Z Lei
ZH Duan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations. Results We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation. Conclusions This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Universidade de Lisboa: Repositório.UL