2,583 research outputs found
Using Element Clustering to Increase the Efficiency of XML Schema Matching
Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research
Dealing with uncertain entities in ontology alignment using rough sets
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision
Enhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV
Ontology-Based Data Access (OBDA) has traditionally focused on providing a
unified view of heterogeneous datasets, either by materializing integrated data
into RDF or by performing on-the fly querying via SPARQL query translation. In
the specific case of tabular datasets represented as several CSV or Excel
files, query translation approaches have been applied by considering each
source as a single table that can be loaded into a relational database
management system (RDBMS). Nevertheless, constraints over these tables are not
represented; thus, neither consistency among attributes nor indexes over tables
are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation
process may be affected, as well as the completeness of the answers produced
during the evaluation of the generated SQL query. Our work is focused on
applying implicit constraints on the OBDA query translation process over
tabular data. We propose Morph-CSV, a framework for querying tabular data that
exploits information from typical OBDA inputs (e.g., mappings, queries) to
enforce constraints that can be used together with any SPARQL-to-SQL OBDA
engine. Morph-CSV relies on both a constraint component and a set of constraint
operators. For a given set of constraints, the operators are applied to each
type of constraint with the aim of enhancing query completeness and
performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM
benchmark; transportation with a benchmark using the GTFS dataset from the
Madrid subway; and biology with a use case extracted from the Bio2RDF project.
We compare and report the performance of two SPARQL-to-SQL OBDA engines,
without and with the incorporation of MorphCSV. The observed results suggest
that Morph-CSV is able to speed up the total query execution time by up to two
orders of magnitude, while it is able to produce all the query answers
- âŚ