35 research outputs found

    Towards Exploiting Implicit Human Feedback for Improving RDF2vec Embeddings

    Get PDF
    RDF2vec is a technique for creating vector space embeddings from an RDF knowledge graph, i.e., representing each entity in the graph as a vector. It first creates sequences of nodes by performing random walks on the graph. In a second step, those sequences are processed by the word2vec algorithm for creating the actual embeddings. In this paper, we explore the use of external edge weights for guiding the random walks. As edge weights, transition probabilities between pages in Wikipedia are used as a proxy for the human feedback for the importance of an edge. We show that in some scenarios, RDF2vec utilizing those transition probabilities can outperform both RDF2vec based on random walks as well as the usage of graph internal edge weights.Comment: Workshop paper accepted at Deep Learning for Knowledge Graphs Workshop 202

    Matching Biomedical Knowledge Graphs with Neural Embeddings

    Get PDF
    Tese de mestrado, CiĂȘncia de Dados, Universidade de Lisboa, Faculdade de CiĂȘncias, 2020Os grafos de conhecimento sĂŁo estruturas que se tornaram fundamentais para a organização dos dados biomĂ©dicos que tĂȘm sido produzidos a um ritmo exponencial nos Ășltimos anos. A abrangente adoção desta forma de estruturar e descrever dados levou ao desenvolvimento de abordagens de prospeção de dados que tirassem partido desta informação com o intuito de auxiliar o progresso do conhecimento cientĂ­fico. PorĂ©m, devido Ă  impossibilidade de isolamento de domĂ­nios de conhecimento e Ă  idiossincrasia humana, grafos de conhecimento construĂ­dos por diferentes indivĂ­duos contĂȘm muitas vezes conceitos equivalentes descritos de forma diferente, dificultando uma anĂĄlise integrada de dados de diferentes grafos de conhecimento. VĂĄrios sistemas de alinhamento de grafos de conhecimento tĂȘm focado a resolução deste desafio. Contudo, o desempenho destes sistemas no alinhamento de grafos de conhecimento biomĂ©dicos estagnou nos Ășltimos quatro anos com algoritmos e recursos externos bastante trabalhados para aprimorar os resultados. Nesta dissertação, apresentamos duas novas abordagens de alinhamento de grafos de conhecimento empregando Neural Embeddings: uma utilizando semelhança simples entre embeddings Ă  base de palavras e de entidades de grafos; outra treinando um modelo mais complexo que refinasse a informação proveniente de embeddings baseados em palavras. A metodologia proposta visa integrar estas abordagens no processo regular de alinhamento, utilizando como infraestrutura o sistema AgreementMakerLight. Estas novas componentes permitem extender os algoritmos de alinhamento do sistema, descobrindo novos mapeamentos, e criar uma abordagem de alinhamento mais generalizĂĄvel e menos dependente de ontologias biomĂ©dicas externas. Esta nova metodologia foi avaliada em trĂȘs casos de teste de alinhamento de ontologias biomĂ©dicas, provenientes da Ontology Alignment Evaluation Initiative. Os resultados demonstraram que apesar de ambas as abordagens nĂŁo excederem o estado da arte, estas obtiveram um desempenho benĂ©fico nas tarefas de alinhamento, superando a performance de todos os sistemas que nĂŁo usam ontologias externas e inclusive alguns que tiram proveito das mesmas, o que demonstra o valor das tĂ©cnicas de Neural Embeddings na tarefa de alinhamento de grafos do conhecimento biomĂ©dicos.Knowledge graphs are data structures which became essential to organize biomedical data produced at an exponential rate in the last few years. The broad adoption of this method of structuring and describing data resulted in the increased interest to develop data mining approaches which took advantage of these information structures in order to improve scientific knowledge. However, due to human idiosyncrasy and also the impossibility to isolate knowledge domains in separate pieces, knowledge graphs constructed by different individuals often contain equivalent concepts described differently. This obstructs the path to an integrated analysis of data described by multiple knowledge graphs. Multiple knowledge graph matching systems have been developed to address this challenge. Nevertheless, the performance of these systems has stagnated in the last four years, despite the fact that they were provided with highly tailored algorithms and external resources to tackle this task. In this dissertation, we present two novel knowledge graph matching approaches employing neural embeddings: one using plain embedding similarity based on word and graph models; the other one using a more complex word-based model which requires training data to refine embeddings. The proposed methodology aims to integrate these approaches in the regular matching process, using the AgreementMakerLight system as a foundation. These new components enable the extension of the system’s current matching algorithms, discovering new mappings, and developing a more generalizable and less dependent on external biomedical ontologies matching procedure. This new methodology was evaluated on three biomedical ontology matching test cases provided by the Ontology Alignment Evaluation Initiative. The results showed that despite both embedding approaches don’t exceed state of the art results, they still produce better results than any other matching systems which do not make use of external ontologies and also surpass some that do benefit from them. This shows that Neural Embeddings are a valuable technique to tackle the challenge of biomedical knowledge graph matching

    Exploiting general-purpose background knowledge for automated schema matching

    Full text link
    The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process. In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources. A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems. One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented. In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications

    Universal Preprocessing Operators for Embedding Knowledge Graphs with Literals

    Full text link
    Knowledge graph embeddings are dense numerical representations of entities in a knowledge graph (KG). While the majority of approaches concentrate only on relational information, i.e., relations between entities, fewer approaches exist which also take information about literal values (e.g., textual descriptions or numerical information) into account. Those which exist are typically tailored towards a particular modality of literal and a particular embedding method. In this paper, we propose a set of universal preprocessing operators which can be used to transform KGs with literals for numerical, temporal, textual, and image information, so that the transformed KGs can be embedded with any method. The results on the kgbench dataset with three different embedding methods show promising results.Comment: Accepted for DL4KG Workshop at ISWC 202

    Interoperability and machine-to-machine translation model with mappings to machine learning tasks

    Get PDF
    Modern large-scale automation systems integrate thousands to hundreds of thousands of physical sensors and actuators. Demands for more flexible reconfiguration of production systems and optimization across different information models, standards and legacy systems challenge current system interoperability concepts. Automatic semantic translation across information models and standards is an increasingly important problem that needs to be addressed to fulfill these demands in a cost-efficient manner under constraints of human capacity and resources in relation to timing requirements and system complexity. Here we define a translator-based operational interoperability model for interacting cyber-physical systems in mathematical terms, which includes system identification and ontology-based translation as special cases. We present alternative mathematical definitions of the translator learning task and mappings to similar machine learning tasks and solutions based on recent developments in machine learning. Possibilities to learn translators between artefacts without a common physical context, for example in simulations of digital twins and across layers of the automation pyramid are briefly discussed.Comment: 7 pages, 2 figures, 1 table, 1 listing. Submitted to the IEEE International Conference on Industrial Informatics 2019, INDIN'1

    Proceedings of the 15th ISWC workshop on Ontology Matching (OM 2020)

    Get PDF
    15th International Workshop on Ontology Matching co-located with the 19th International Semantic Web Conference (ISWC 2020)International audienc
    corecore