Search CORE

9,359 research outputs found

Towards information profiling: data lake content metadata management

Author: Abelló Gamazo Alberto
Al-serafi Ayman Mounir Mohamed
Calders Toon
Romero Moral Óscar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

A Large Scale Dataset for the Evaluation of Ontology Matching Systems

Author: Avesani Paolo
Giunchiglia Fausto
Shvaiko Pavel
Yatskevich Mikalai
Publication venue
Publication date: 01/01/2008
Field of study

Recently, the number of ontology matching techniques and systems has increased significantly. This makes the issue of their evaluation and comparison more severe. One of the challenges of the ontology matching evaluation is in building large scale evaluation datasets. In fact, the number of possible correspondences between two ontologies grows quadratically with respect to the numbers of entities in these ontologies. This often makes the manual construction of the evaluation datasets demanding to the point of being infeasible for large scale matching tasks. In this paper we present an ontology matching evaluation dataset composed of thousands of matching tasks, called TaxME2. It was built semi-automatically out of the Google, Yahoo and Looksmart web directories. We evaluated TaxME2 by exploiting the results of almost two dozen of state of the art ontology matching systems. The experiments indicate that the dataset possesses the desired key properties, namely it is error-free, incremental, discriminative, monotonic, and hard for the state of the art ontology matching systems. The paper has been accepted for publication in "The Knowledge Engineering Review", Cambridge Universty Press (ISSN: 0269-8889, EISSN: 1469-8005)

CiteSeerX

Archivio della ricerca - Fondazione Bruno Kessler

Unitn-eprints Research

Recommended from our members

Results of the ontology alignment evaluation initiative 2019

Author: Algergawy A.
Faria D.
Ferrara A.
Fundulaki I.
Harrow I.
Hertling S.
Jimenez-Ruiz E.
Karam N.
Khiat A.
Lambrix P.
Li H.
Montanelli S.
Paulheim H.
Pesquita C.
Saveta T.
Shvaiko P.
Splendiani A.
Thiéblin E.
Trojahn C.
Vataščinová J.
Zamazal O.
Zhou L.
Publication venue
Publication date: 01/01/2019
Field of study

The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity (from simple thesauri to expressive OWL ontologies) and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2019 campaign offered 11 tracks with 29 test cases, and was attended by 20 participants. This paper is an overall presentation of that campaign

City Research Online

MAnnheim DOCument Server

Dealing with uncertain entities in ontology alignment using rough sets

Author: Alireza Mousavi
Hamed Al-Raweshidy
Man Qi
Maozhen Li
Sadaqat Jan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2012
Field of study

This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision

Crossref

Brunel University Research Archive

Biomedical ontology alignment: An approach based on representation learning

Author: Kalousis Alexandros
Kiritsis Dimitris
Kolyvakis Prodromos
Smith Barry
Publication venue
Publication date: 01/01/2018
Field of study

While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results

PhilPapers

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Directory of Open Access Journals

Ontology-based data access mapping generation using data, schema,query, and mapping knowledge

Author: Dimou Anastasia
Heyvaert Pieter
Mannens Erik
Verborgh Ruben
Publication venue
Publication date: 01/01/2017
Field of study

Ghent University Academic Bibliography

Data-driven network alignment

Author: Gu Shawn
Milenkovic Tijana
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Biological network alignment (NA) aims to find a node mapping between species' molecular networks that uncovers similar network regions, thus allowing for transfer of functional knowledge between the aligned nodes. However, current NA methods do not end up aligning functionally related nodes. A likely reason is that they assume it is topologically similar nodes that are functionally related. However, we show that this assumption does not hold well. So, a paradigm shift is needed with how the NA problem is approached. We redefine NA as a data-driven framework, TARA (daTA-dRiven network Alignment), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity, like traditional NA methods do. TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns. We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. Clearly, TARA as currently implemented uses topological but not protein sequence information for this task. We find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance

arXiv.org e-Print Archive

Directory of Open Access Journals

An ontology matching approach for semantic modeling: A case study in smart cities

Author: Akli-Astouati Karima
Belhadi Hiba
Cano Alberto
Djenouri Youcef
Lin Jerry Chun-Wei
Publication venue: 'Wiley'
Publication date: 01/01/2021
Field of study

This paper investigates the semantic modeling of smart cities and proposes two ontology matching frameworks, called Clustering for Ontology Matching-based Instances (COMI) and Pattern mining for Ontology Matching-based Instances (POMI). The goal is to discover the relevant knowledge by investigating the correlations among smart city data based on clustering and pattern mining approaches. The COMI method first groups the highly correlated ontologies of smart-city data into similar clusters using the generic k-means algorithm. The key idea of this method is that it clusters the instances of each ontology and then matches two ontologies by matching their clusters and the corresponding instances within the clusters. The POMI method studies the correlations among the data properties and selects the most relevant properties for the ontology matching process. To demonstrate the usefulness and accuracy of the COMI and POMI frameworks, several experiments on the DBpedia, Ontology Alignment Evaluation Initiative, and NOAA ontology databases were conducted. The results show that COMI and POMI outperform the state-of-the-art ontology matching models regarding computational cost without losing the quality during the matching process. Furthermore, these results confirm the ability of COMI and POMI to deal with heterogeneous large-scale data in smart-city environments.publishedVersio

SINTEF Open