5 research outputs found
Alinhamento de vocabulário de domínio utilizando os sistemas AML e LogMap
Introduction: In the context of the Semantic Web, interoperability among
heterogeneous ontologies is a challenge due to several factors, among which semantic ambiguity and redundancy stand out. To overcome these challenges, systems and algorithms are adopted to align different ontologies. In this study, it is understood that controlled vocabularies are a particular form of ontology.
Objective: to obtain a vocabulary resulting from the alignment and fusion of the Vocabularies Scientific Domains and Scientific Areas of the Foundation for Science and Technology, - FCT, European Science Vocabulary - EuroSciVoc and United Nations Educational, Scientific and Cultural Organization - UNESCO nomenclature for fields of Science and Technology, in the Computing Sciences domain, to be used
in the IViSSEM project. Methodology: literature review on systems/algorithms for
ontology alignment, using the Preferred Reporting Items for Systematic Reviews
and Meta-Analyses - PRISMA methodology; alignment of the three vocabularies;
and validation of the resulting vocabulary by means of a Delphi study. Results: we
proceeded to analyze the 25 ontology alignment systems and variants that
participated in at least one track of the Ontology Alignment Evaluation Initiative
competition between 2018 and 2019. From these systems, Agreement Maker Light
and Log Map were selected to perform the alignment of the three vocabularies,
making a cut to the area of Computer Science. Conclusion: The vocabulary was
obtained from Agreement Maker Light for having presented a better performance.
At the end, a vocabulary with 98 terms was obtained in the Computer Science
domain to be adopted by the IViSSEM project. The alignment resulted from the
vocabularies used by FCT (Portugal), with the one adopted by the European Union
(EuroSciVoc) and another one from the domain of Science & Technology
(UNESCO). This result is beneficial to other universities and projects, as well as to
FCT itself.Introdução: No contexto da Web Semântica, a interoperabilidade entre ontologias heterogêneas é um desafio devido a diversos fatores entre os quais se destacam a ambiguidade e a redundância semântica. Para superar tais desafios, adota-se sistemas e algoritmos para alinhamento de diferentes ontologias. Neste estudo, entende-se que vocabulários controlados são uma forma particular de ontologias.
Objetivo: obter um vocabulário resultante do alinhamento e fusão dos vocabulários
Domínios Científicos e Áreas Científicas da Fundação para Ciência e Tecnologia, - FCT, European Science Vocabulary - EuroSciVoc e Organização das Nações Unidas para a Educação, a Ciência e a Cultura - UNESCO nomenclature for fields of Science and
Technology, no domínio Ciências da Computação, para ser usado no âmbito do projeto IViSSEM. Metodologia: revisão da literatura sobre sistemas/algoritmos para
alinhamento de ontologias, utilizando a metodologia Preferred Reporting Items for Systematic Reviews and Meta-Analyses - PRISMA; alinhamento dos três
vocabulários; e validação do vocabulário resultante por meio do estudo Delphi.
Resultados: procedeu-se à análise dos 25 sistemas de alinhamento de ontologias e
variantes que participaram de pelo menos uma track da competição Ontology
Alignment Evaluation Iniciative entre 2018 e 2019. Destes sistemas foram
selecionados Agreement Maker Light e LogMap para realizar o alinhamento dos três
vocabulários, fazendo um recorte para a área da Ciência da Computação.
Conclusão: O vocabulário foi obtido a partir do Agreement Maker Light por ter
apresentado uma melhor performance. Ao final foi obtido o vocabulário, com 98
termos, no domínio da Ciência da Computação a ser adotado pelo projeto IViSSEM.
O alinhamento resultou dos vocabulários utilizados pela FCT (Portugal), com o
adotado pela União Europeia (EuroSciVoc) e outro do domínio da
Ciência&Tecnologia (UNESCO). Esse resultado é proveitoso para outras
universidades e projetos, bem como para a própria FCT
Ontology Matching: OM-2018: Proceedings of the ISWC Workshop
International audienceno abstrac
Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2022The ontology matching process focuses on discovering mappings between two concepts from distinct
ontologies, a source and a target. It is a fundamental step when trying to integrate heterogeneous data
sources that are described in ontologies. This data represents an even more challenging problem since
we are working with complex data as biomedical data. Thus, derived from the necessity of keeping on
improving ontology matching techniques, this dissertation focused on implementing a new approach to
the AML pipeline to calculate similarities between entities from two distinct ontologies.
For the implementation of this dissertation, we used some of the OAEI tracks, such as Anatomy
and LargeBio, to apply a new algorithm and evaluate if it improves AML’s results against a refer ence alignment. This new approach consisted of using pre-trained word embeddings of five different
types, BioWordVec Extrinsic, BioWordVec Intrinsic, PubMed+PC, PubMed+PC+Wikipedia and English
Wikipedia. These pre-trained word embeddings use a machine learning technique, Word2Vec, and were
used in this work since it allows to carry the semantic meaning inherent to the words represented with
the corresponding vector. Word embeddings allowed that each concept of each ontology was represented
with a corresponding vector to see if, with that information, it was possible to improve how relations
between concepts were determined in the AML system. The similarity between concepts was calculated
through the cosine distance and the evaluation of the new alignment used the metrics precision recall
and F-measure. Although we could not prove that word embeddings improve AML current results, this
implementation could be refined, and the technique can be still an option to consider in future work if
applied in some other way
Exploiting general-purpose background knowledge for automated schema matching
The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process.
In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources.
A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems.
One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented.
In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications
Semantic Systems. The Power of AI and Knowledge Graphs
This open access book constitutes the refereed proceedings of the 15th International Conference on Semantic Systems, SEMANTiCS 2019, held in Karlsruhe, Germany, in September 2019. The 20 full papers and 8 short papers presented in this volume were carefully reviewed and selected from 88 submissions. They cover topics such as: web semantics and linked (open) data; machine learning and deep learning techniques; semantic information management and knowledge integration; terminology, thesaurus and ontology management; data mining and knowledge discovery; semantics in blockchain and distributed ledger technologies