111 research outputs found
Benchmark datasets for biomedical knowledge graphs with negative statements
Knowledge graphs represent facts about real-world entities. Most of these
facts are defined as positive statements. The negative statements are scarce
but highly relevant under the open-world assumption. Furthermore, they have
been demonstrated to improve the performance of several applications, namely in
the biomedical domain. However, no benchmark dataset supports the evaluation of
the methods that consider these negative statements.
We present a collection of datasets for three relation prediction tasks -
protein-protein interaction prediction, gene-disease association prediction and
disease prediction - that aim at circumventing the difficulties in building
benchmarks for knowledge graphs with negative statements. These datasets
include data from two successful biomedical ontologies, Gene Ontology and Human
Phenotype Ontology, enriched with negative statements.
We also generate knowledge graph embeddings for each dataset with two popular
path-based methods and evaluate the performance in each task. The results show
that the negative statements can improve the performance of knowledge graph
embeddings
Explainable Representations for Relation Prediction in Knowledge Graphs
Knowledge graphs represent real-world entities and their relations in a
semantically-rich structure supported by ontologies. Exploring this data with
machine learning methods often relies on knowledge graph embeddings, which
produce latent representations of entities that preserve structural and local
graph neighbourhood properties, but sacrifice explainability. However, in tasks
such as link or relation prediction, understanding which specific features
better explain a relation is crucial to support complex or critical
applications.
We propose SEEK, a novel approach for explainable representations to support
relation prediction in knowledge graphs. It is based on identifying relevant
shared semantic aspects (i.e., subgraphs) between entities and learning
representations for each subgraph, producing a multi-faceted and explainable
representation.
We evaluate SEEK on two real-world highly complex relation prediction tasks:
protein-protein interaction prediction and gene-disease association prediction.
Our extensive analysis using established benchmarks demonstrates that SEEK
achieves significantly better performance than standard learning representation
methods while identifying both sufficient and necessary explanations based on
shared semantic aspects.Comment: 16 pages, 3 figure
Ontology Matching Techniques for Enterprise Architecture Models
Abstract. Current Enterprise Architecture (EA) approaches tend to be generic, based on broad meta-models that cross-cut distinct architectural domains. Integrating these models is necessary to an effective EA process, in order to support, for example, benchmarking of business processes or assessing compliance to structured requirements. However, the integration of EA models faces challenges stemming from structural and semantic heterogeneities that could be addressed by ontology matching techniques. For that, we used AgreementMakerLight, an ontology matching system, to evaluate a set of state of the art matching approaches that could adequately address some of the heterogeneity issues. We assessed the matching of EA models based on the ArchiMate and BPMN languages, which made possible to conclude about not only the potential but also of the limitations of these techniques to properly explore the more complex semantics present in these models. Enterprise Architecture (EA) is a practice to support the analysis, design and implementation of a business strategy in an organization, considering its relevant multiple domains. In recent years, a variety of Enterprise Architecture To support the matching tasks we have used AgreementMakerLight (AML
The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources
BACKGROUND: Epidemiology is a data-intensive and multi-disciplinary subject, where data integration, curation and sharing are becoming increasingly relevant, given its global context and time constraints. The semantic annotation of epidemiology resources is a cornerstone to effectively support such activities. Although several ontologies cover some of the subdomains of epidemiology, we identified a lack of semantic resources for epidemiology-specific terms. This paper addresses this need by proposing the Epidemiology Ontology (EPO) and by describing its integration with other related ontologies into a semantic enabled platform for sharing epidemiology resources. RESULTS: The EPO follows the OBO Foundry guidelines and uses the Basic Formal Ontology (BFO) as an upper ontology. The first version of EPO models several epidemiology and demography parameters as well as transmission of infection processes, participants and related procedures. It currently has nearly 200 classes and is designed to support the semantic annotation of epidemiology resources and data integration, as well as information retrieval and knowledge discovery activities. CONCLUSIONS: EPO is under active development and is freely available at https://code.google.com/p/epidemiology-ontology/. We believe that the annotation of epidemiology resources with EPO will help researchers to gain a better understanding of global epidemiological events by enhancing data integration and sharing
Special issue on ontology and linked data matching
cheatham2017bEditorial, Semantic web journal 8(2):183-18
DDB-EDM to FaBiO: The Case of the German Digital Library
Cultural heritage portals have the goal of providing users
with seamless access to all their resources. This paper introduces initial
efforts for a user-oriented restructuring of the German Digital Library
(DDB). At present, cultural heritage objects (CHOs) in the DDB are
modeled using an extended version of the Europeana Data Model (DDBEDM), which negatively impacts usability and exploration. These challenges can be addressed by exploiting ontologies, and building a knowledge graph from the DDB’s voluminous collection. Towards this goal, an
alignment of bibliographic metadata from DDB-EDM to FRBR-Aligned
Bibliographic Ontology (FaBiO) is presented
QuoteKG: A Multilingual Knowledge Graph of Quotes
Quotes of public figures can mark turning points in history. A quote can explain its originator’s actions, foreshadowing political or personal decisions and revealing character traits. Impactful quotes cross language barriers and influence the general population’s reaction to specific stances, always facing the risk of being misattributed or taken out of context. The provision of a cross-lingual knowledge graph of quotes that establishes the authenticity of quotes and their contexts is of great importance to allow the exploration of the lives of important people as well as topics from the perspective of what was actually said. In this paper, we present QuoteKG, the first multilingual knowledge graph of quotes. We propose the QuoteKG creation pipeline that extracts quotes from Wikiquote, a free and collaboratively created collection of quotes in many languages, and aligns different mentions of the same quote. QuoteKG includes nearly one million quotes in 55 languages, said by more than 69, 000 people of public interest across a wide range of topics. QuoteKG is publicly available and can be accessed via a SPARQL endpoint
Towards Certified Distributed Query Processing
In recent years, knowledge graphs (KGs) have gained more and more importance. As a consequence of that, the number of publicly accessible KGs is increasing. Due to their adoption in many areas, KGs are used in numerous different applications. However, these knowledge graph applications are not developed by the data owners and they might collect data from several linked KGs. It is therefore essential that systems accessing KGs are certified, i.e., each component is certified for a specific use by an entity or agency. In addition, a trace of the performed operations and used data is needed in order to verify that all requirements were met, e.g., some data cannot be transferred from the source to any other component due to privacy restrictions. This work describes the vision of certified distributed querying in the context of an analytics platform. Challenges for such systems are identified and discussed. © 2023 Copyright for this paper by its authors
Results of the Ontology Alignment Evaluation Initiative 2015
cheatham2016aInternational audienceOntology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation and consensus. OAEI 2015 offered 8 tracks with 15 test cases followed by 22 participants. Since 2011, the campaign has been using a new evaluation modality which provides more automation to the evaluation. This paper is an overall presentation of the OAEI 2015 campaign
Metrics for GO based protein semantic similarity: a systematic evaluation
<p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.</p> <p>Results</p> <p>We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.</p> <p>Conclusions</p> <p>This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p
- …