860 research outputs found
Matching Biomedical Knowledge Graphs with Neural Embeddings
Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2020Os grafos de conhecimento são estruturas que se tornaram fundamentais para a organização dos dados biomédicos que têm sido produzidos a um ritmo exponencial nos últimos anos. A abrangente adoção desta forma de estruturar e descrever dados levou ao desenvolvimento de abordagens de prospeção de dados que tirassem partido desta informação com o intuito de auxiliar o progresso do conhecimento cientÃfico. Porém, devido à impossibilidade de isolamento de domÃnios de conhecimento e à idiossincrasia humana, grafos de conhecimento construÃdos por diferentes indivÃduos contêm muitas vezes conceitos equivalentes descritos de forma diferente, dificultando uma análise integrada de dados de diferentes grafos de conhecimento. Vários sistemas de alinhamento de grafos de conhecimento têm focado a resolução deste desafio. Contudo, o desempenho destes sistemas no alinhamento de grafos de conhecimento biomédicos estagnou nos últimos quatro anos com algoritmos e recursos externos bastante trabalhados para aprimorar os resultados. Nesta dissertação, apresentamos duas novas abordagens de alinhamento de grafos de conhecimento empregando Neural Embeddings: uma utilizando semelhança simples entre embeddings à base de palavras e de entidades de grafos; outra treinando um modelo mais complexo que refinasse a informação proveniente de embeddings baseados em palavras. A metodologia proposta visa integrar estas abordagens no processo regular de alinhamento, utilizando como infraestrutura o sistema AgreementMakerLight. Estas novas componentes permitem extender os algoritmos de alinhamento do sistema, descobrindo novos mapeamentos, e criar uma abordagem de alinhamento mais generalizável e menos dependente de ontologias biomédicas externas. Esta nova metodologia foi avaliada em três casos de teste de alinhamento de ontologias biomédicas, provenientes da Ontology Alignment Evaluation Initiative. Os resultados demonstraram que apesar de ambas as abordagens não excederem o estado da arte, estas obtiveram um desempenho benéfico nas tarefas de alinhamento, superando a performance de todos os sistemas que não usam ontologias externas e inclusive alguns que tiram proveito das mesmas, o que demonstra o valor das técnicas de Neural Embeddings na tarefa de alinhamento de grafos do conhecimento biomédicos.Knowledge graphs are data structures which became essential to organize biomedical data produced at an exponential rate in the last few years. The broad adoption of this method of structuring and describing data resulted in the increased interest to develop data mining approaches which took advantage of these information structures in order to improve scientific knowledge. However, due to human idiosyncrasy and also the impossibility to isolate knowledge domains in separate pieces, knowledge graphs constructed by different individuals often contain equivalent concepts described differently. This obstructs the path to an integrated analysis of data described by multiple knowledge graphs. Multiple knowledge graph matching systems have been developed to address this challenge. Nevertheless, the performance of these systems has stagnated in the last four years, despite the fact that they were provided with highly tailored algorithms and external resources to tackle this task. In this dissertation, we present two novel knowledge graph matching approaches employing neural embeddings: one using plain embedding similarity based on word and graph models; the other one using a more complex word-based model which requires training data to refine embeddings. The proposed methodology aims to integrate these approaches in the regular matching process, using the AgreementMakerLight system as a foundation. These new components enable the extension of the system’s current matching algorithms, discovering new mappings, and developing a more generalizable and less dependent on external biomedical ontologies matching procedure. This new methodology was evaluated on three biomedical ontology matching test cases provided by the Ontology Alignment Evaluation Initiative. The results showed that despite both embedding approaches don’t exceed state of the art results, they still produce better results than any other matching systems which do not make use of external ontologies and also surpass some that do benefit from them. This shows that Neural Embeddings are a valuable technique to tackle the challenge of biomedical knowledge graph matching
Ontology matching: state of the art and future challenges
shvaiko2013aInternational audienceAfter years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue some further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field
Survey: Models and Prototypes of Schema Matching
Schema matching is critical problem within many applications to integration of data/information, to achieve interoperability, and other cases caused by schematic heterogeneity. Schema matching evolved from manual way on a specific domain, leading to a new models and methods that are semi-automatic and more general, so it is able to effectively direct the user within generate a mapping among elements of two the schema or ontologies better. This paper is a summary of literature review on models and prototypes on schema matching within the last 25 years to describe the progress of and research chalenge and opportunities on a new models, methods, and/or prototypes
Instance-Based Matching of Large Life Science Ontologies
Ontologies are heavily used in life sciences so that there is increasing value to match different ontologies in order to determine related conceptual categories. We propose a simple yet powerful methodology for instance-based ontology matching which utilizes the associations between molecular-biological objects and ontologies. The approach can build on many existing ontology associations for instance objects like sequences and proteins and thus makes heavy use of available domain knowledge. Furthermore, the approach is flexible and extensible since each instance source with associations to the ontologies of interest can contribute to the ontology mapping. We study several approaches to determine the instance-based similarity of ontology categories. We perform an extensive experimental evaluation to use protein associations for different species to match between subontologies of the Gene Ontology and OMIM. We also provide a comparison with metadata-based ontology matching
VersaMatch : ontology matching with weak supervision
Ontology matching is crucial to data integration for across-silo data sharing and has been mainly addressed with heuristic and machine learning (ML) methods. While heuristic methods are often inflexible and hard to extend to new domains, ML methods rely on substantial and hard to obtain amounts of labeled training data. To overcome these limitations, we propose VersaMatch, a flexible, weakly-supervised ontology matching system. VersaMatch employs various weak supervision sources, such as heuristic rules, pattern matching, and external knowledge bases, to produce labels from a large amount of unlabeled data for training a discriminative ML model. For prediction, VersaMatch develops a novel ensemble model combining the weak supervision sources with the discriminative model to support generalization while retaining a high precision. Our ensemble method boosts end model performance by 4 points compared to a traditional weak-supervision baseline. In addition, compared to state-of-the-art ontology matchers, VersaMatch achieves an overall 4-point performance improvement in F1 score across 26 ontology combinations from different domains. For recently released, in-the-wild datasets, VersaMatch beats the next best matchers by 9 points in F1. Furthermore, its core weak-supervision logic can easily be improved by adding more knowledge sources and collecting more unlabeled data for training
The Foundational Model of Anatomy Ontology
Anatomy is the structure of biological organisms. The term also denotes the scientific
discipline devoted to the study of anatomical entities and the structural and
developmental relations that obtain among these entities during the lifespan of an
organism. Anatomical entities are the independent continuants of biomedical reality on
which physiological and disease processes depend, and which, in response to etiological
agents, can transform themselves into pathological entities. For these reasons, hard copy
and in silico information resources in virtually all fields of biology and medicine, as a
rule, make extensive reference to anatomical entities. Because of the lack of a
generalizable, computable representation of anatomy, developers of computable
terminologies and ontologies in clinical medicine and biomedical research represented
anatomy from their own more or less divergent viewpoints. The resulting heterogeneity
presents a formidable impediment to correlating human anatomy not only across
computational resources but also with the anatomy of model organisms used in
biomedical experimentation. The Foundational Model of Anatomy (FMA) is being
developed to fill the need for a generalizable anatomy ontology, which can be used and
adapted by any computer-based application that requires anatomical information.
Moreover it is evolving into a standard reference for divergent views of anatomy and a
template for representing the anatomy of animals. A distinction is made between the FMA
ontology as a theory of anatomy and the implementation of this theory as the FMA
artifact. In either sense of the term, the FMA is a spatial-structural ontology of the
entities and relations which together form the phenotypic structure of the human
organism at all biologically salient levels of granularity. Making use of explicit
ontological principles and sound methods, it is designed to be understandable by human
beings and navigable by computers. The FMA’s ontological structure provides for
machine-based inference, enabling powerful computational tools of the future to reason
with biomedical data
Matching ontologies for context
euzenat2007dNo abstract available
Automated extension of biomedical ontologies
Developing and extending a biomedical ontology is a very demanding
process, particularly because biomedical knowledge is diverse, complex
and continuously changing and growing. Existing automated
and semi-automated techniques are not tailored to handling the issues
in extending biomedical ontologies.
This thesis advances the state of the art in semi-automated ontology
extension by presenting a framework as well as methods and
methodologies for automating ontology extension specifically designed
to address the features of biomedical ontologies.The overall strategy is
based on first predicting the areas of the ontology that are in need of
extension and then applying ontology learning and ontology matching
techniques to extend them. A novel machine learning approach for
predicting these areas based on features of past ontology versions was
developed and successfully applied to the Gene Ontology. Methods
and techniques were also specifically designed for matching biomedical
ontologies and retrieving relevant biomedical concepts from text,
which were shown to be successful in several applications.O desenvolvimento e extensão de uma ontologia biomédica é um processo
muito exigente, dada a diversidade, complexidade e crescimento
contÃnuo do conhecimento biomédico. As técnicas existentes nesta
área não estão preparadas para lidar com os desafios da extensão de
uma ontologia biomédica.
Esta tese avança o estado da arte na extensão semi-automática de ontologias,
apresentando uma framework assim como métodos e metodologias
para a automação da extensão de ontologias especificamente desenhados
tendo em conta as caracterÃsticas das ontologias biomédicas.
A estratégia global é baseada em primeiro prever quais as áreas da ontologia
que necessitam extensão, e depois usá-las como enfoque para
técnicas de alinhamento e aprendizagem de ontologias, com o objectivo
de as estender. Uma nova estratégia de aprendizagem automática
para prever estas áreas baseada em atributos de antigas versões de
ontologias foi desenvolvida e testada com sucesso na Gene Ontology.
Foram também especificamente desenvolvidos métodos e técnicas para
o alinhamento de ontologias biomédicas e extracção de conceitos relevantes
de texto, cujo sucesso foi demonstrado em várias aplicações.Fundação para a Ciência e a Tecnologi
Proceedings of The Tenth International Workshop on Ontology Matching (OM-2015)
shvaiko2016aInternational audienceno abstrac
A Hybrid Model Schema Matching Using Constraint-Based and Instance-Based
Schema matching is an important process in the Enterprise Information Integration (EII) which is at the level of the back end to solve the problems due to the schematic heterogeneity. This paper is a summary of preliminary result work of the model development stage as part of research on the development of models and prototype of hybrid schema matching that combines two methods, namely constraint-based and instance-based. The discussion includes a general description of the proposed models and the development of models, start from requirement analysis, data type conversion, matching mechanism, database support, constraints and instance extraction, matching and compute the similarity, preliminary result, user verification, verified result, dataset for testing, as well as the performance measurement. Based on result experiment on 36 datasets of heterogeneous RDBMS, it obtained the highest P value is 100.00% while the lowest is 71.43%; The highest R value is 100.00% while the lowest is 75.00%; and F-Measure highest value is 100.00% while the lowest is 81.48%. Unsuccessful matching on the model still happens, including use of an id attribute with data type as autoincrement; using codes that are defined in the same way but different meanings; and if encountered in common instance with the same definition but different meaning
- …