Search CORE

23,594 research outputs found

A Large Scale Dataset for the Evaluation of Ontology Matching Systems

Author: Avesani Paolo
Giunchiglia Fausto
Shvaiko Pavel
Yatskevich Mikalai
Publication venue
Publication date: 01/01/2008
Field of study

Recently, the number of ontology matching techniques and systems has increased significantly. This makes the issue of their evaluation and comparison more severe. One of the challenges of the ontology matching evaluation is in building large scale evaluation datasets. In fact, the number of possible correspondences between two ontologies grows quadratically with respect to the numbers of entities in these ontologies. This often makes the manual construction of the evaluation datasets demanding to the point of being infeasible for large scale matching tasks. In this paper we present an ontology matching evaluation dataset composed of thousands of matching tasks, called TaxME2. It was built semi-automatically out of the Google, Yahoo and Looksmart web directories. We evaluated TaxME2 by exploiting the results of almost two dozen of state of the art ontology matching systems. The experiments indicate that the dataset possesses the desired key properties, namely it is error-free, incremental, discriminative, monotonic, and hard for the state of the art ontology matching systems. The paper has been accepted for publication in "The Knowledge Engineering Review", Cambridge Universty Press (ISSN: 0269-8889, EISSN: 1469-8005)

CiteSeerX

Archivio della ricerca - Fondazione Bruno Kessler

Unitn-eprints Research

MultiFarm: A benchmark for multilingual ontology matching

Author: Andrei Tamilin
Christian Meilicke
Cássia Trojahn
Elena Montiel-Ponsoda
Euzenat
Euzenat
Fred Freitas
Fu
García-Castro
Giunchiglia
Heiner Stuckenschmidt
Jung
Neches
Niepert
Ondřej Šváb-Zamazal
Raúl García-Castro
Ryan Ribeiro de Azevedo
Shenghui Wang
Vojtěch Svátek
Wang
Willem Robert van Hage
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

In this paper we present the MultiFarm dataset, which has been designed as a benchmark for multilingual ontology matching. The MultiFarm dataset is composed of a set of ontologies translated in different languages and the corresponding alignments between these ontologies. It is based on the OntoFarm dataset, which has been used successfully for several years in the Ontology Alignment Evaluation Initiative (OAEI). By translating the ontologies of the OntoFarm dataset into eight different languages – Chinese, Czech, Dutch, French, German, Portuguese, Russian, and Spanish – we created a comprehensive set of realistic test cases. Based on these test cases, it is possible to evaluate and compare the performance of matching approaches with a special focus on multilingualism

VU Research Portal

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

MAnnheim DOCument Server

Archivo Digital UPM

Recommended from our members

Results of the ontology alignment evaluation initiative 2019

Author: Algergawy A.
Faria D.
Ferrara A.
Fundulaki I.
Harrow I.
Hertling S.
Jimenez-Ruiz E.
Karam N.
Khiat A.
Lambrix P.
Li H.
Montanelli S.
Paulheim H.
Pesquita C.
Saveta T.
Shvaiko P.
Splendiani A.
Thiéblin E.
Trojahn C.
Vataščinová J.
Zamazal O.
Zhou L.
Publication venue
Publication date: 01/01/2019
Field of study

The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity (from simple thesauri to expressive OWL ontologies) and use different evaluation modalities (e.g., blind evaluation, open evaluation, or consensus). The OAEI 2019 campaign offered 11 tracks with 29 test cases, and was attended by 20 participants. This paper is an overall presentation of that campaign

City Research Online

MAnnheim DOCument Server

Recommended from our members

SemTab 2019: Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems

Author: E Jiménez-Ruiz
E Kacprzak
G Limaye
J Euzenat
J Euzenat
Jiaoyan Chen
Mauricio A. Hernández
V Efthymiou
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Tabular data to Knowledge Graph matching is the process of assigning semantic tags from knowledge graphs (e.g., Wikidata or DBpedia) to the elements of a table. This task is a challenging problem for various reasons, including the lack of metadata (e.g., table and column names), the noisiness, heterogeneity, incompleteness and ambiguity in the data. The results of this task provide significant insights about potentially highly valuable tabular data, as recent works have shown, enabling a new family of data analytics and data science applications. Despite significant amount of work on various flavors of this problem, there is a lack of a common framework to conduct a systematic evaluation of state-of-the-art systems. The creation of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) aims at filling this gap. In this paper, we report about the datasets, infrastructure and lessons learned from the first edition of the SemTab challenge

City Research Online

Crossref

NORA - Norwegian Open Research Archives

Structuring visual exploratory analysis of skill demand

Author: A.-S. Dadzie
Brooke
Cunningham
Davenport
E.M. Sibarani
Einsfeld
Elia
Ellis
Gehl
Granville
Harper
Heer
Hervás
I. Novalija
Inselberg
Keim
Khobreh
Kulyk
Liu
Munzner
Nielsen
S. Scerri
Sacha
Sibarani
Strawn
Terblanche
Tominski
Turkay
Ware
Welter
Wowczko
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

The analysis of increasingly large and diverse data for meaningful interpretation and question answering is handicapped by human cognitive limitations. Consequently, semi-automatic abstraction of complex data within structured information spaces becomes increasingly important, if its knowledge content is to support intuitive, exploratory discovery. Exploration of skill demand is an area where regularly updated, multi-dimensional data may be exploited to assess capability within the workforce to manage the demands of the modern, technology- and data-driven economy. The knowledge derived may be employed by skilled practitioners in defining career pathways, to identify where, when and how to update their skillsets in line with advancing technology and changing work demands. This same knowledge may also be used to identify the combination of skills essential in recruiting for new roles. To address the challenges inherent in exploring the complex, heterogeneous, dynamic data that feeds into such applications, we investigate the use of an ontology to guide structuring of the information space, to allow individuals and institutions to interactively explore and interpret the dynamic skill demand landscape for their specific needs. As a test case we consider the relatively new and highly dynamic field of Data Science, where insightful, exploratory data analysis and knowledge discovery are critical. We employ context-driven and task-centred scenarios to explore our research questions and guide iterative design, development and formative evaluation of our ontology-driven, visual exploratory discovery and analysis approach, to measure where it adds value to users’ analytical activity. Our findings reinforce the potential in our approach, and point us to future paths to build on

Crossref

Fraunhofer-ePrints

Open Research Online (The Open University)

LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs

Author: Amann Bernd
Curé Olivier
Naacke Hubert
Randriamalala Tendry
Publication venue
Publication date: 12/10/2015
Field of study

The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted with various "big data" problems. Query processing in the presence of inferences is one them. For instance, to complete the answer set of SPARQL queries, RDF database systems evaluate semantic RDFS relationships (subPropertyOf, subClassOf) through time-consuming query rewriting algorithms or space-consuming data materialization solutions. To reduce the memory footprint and ease the exchange of large datasets, these systems generally apply a dictionary approach for compressing triple data sizes by replacing resource identifiers (IRIs), blank nodes and literals with integer values. In this article, we present a structured resource identification scheme using a clever encoding of concepts and property hierarchies for efficiently evaluating the main common RDFS entailment rules while minimizing triple materialization and query rewriting. We will show how this encoding can be computed by a scalable parallel algorithm and directly be implemented over the Apache Spark framework. The efficiency of our encoding scheme is emphasized by an evaluation conducted over both synthetic and real world datasets.Comment: 8 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM