19,213 research outputs found
Dealing with uncertain entities in ontology alignment using rough sets
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision
Keeping the data lake in form: proximity mining for pre-filtering schema matching
Data Lakes (DLs) are large repositories of raw datasets from disparate sources. As more datasets are ingested into a DL, there is an increasing need for efficient techniques to profile them and to detect the relationships among their schemata, commonly known as holistic schema matching. Schema matching detects similarity between the information stored in the datasets to support information discovery and retrieval. Currently, this is computationally expensive with the volume of state-of-the-art DLs. To handle this challenge, we propose a novel early-pruning approach to improve efficiency, where we collect different types of content metadata and schema metadata about the datasets, and then use this metadata in early-pruning steps to pre-filter the schema matching comparisons. This involves computing proximities between datasets based on their metadata, discovering their relationships based on overall proximities and proposing similar dataset pairs for schema matching. We improve the effectiveness of this task by introducing a supervised mining approach for effectively detecting similar datasets which are proposed for further schema matching. We conduct extensive experiments on a real-world DL which proves the success of our approach in effectively detecting similar datasets for schema matching, with recall rates of more than 85% and efficiency improvements above 70%. We empirically show the computational cost saving in space and time by applying our approach in comparison to instance-based schema matching techniques.This research was partially funded by the European Commission through the Erasmus Mundus Joint Doctorate (IT4BI-DC).Peer ReviewedPostprint (author's final draft
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
Effective and Efficient Data Access in the Versatile Web Query Language Xcerpt
Access to Web data has become an integral part of many applications
and services. In the past, such data has usually been accessed
through human-tailoredHTMLinterfaces.Nowadays, rich client interfaces
in desktop applications or, increasingly, in browser-based clients ease data
access and allow more complex client processing based on XML or RDF
data retrieved throughWeb service interfaces. Convenient specifications of
the data processing on the client and flexible, expressive service interfaces
for data access become essential in this context.Web query languages such
as XQuery, XSLT, SPARQL, or Xcerpt have been tailored specifically for
such a setting: declarative and efficient access and processing ofWeb data.
Xcerpt stands apart among these languages by its versatility, i.e., its ability
to access not just oneWeb format but many. In this demonstration, two aspects
of Xcerpt are illustrated in detail: The first part of the demonstration
focuses on Xcerptâs pattern matching constructs and rules to enable effective
and versatile data access. It uses a concrete practical use case from
bibliography management to illustrate these language features. Xcerptâs
visual companion language visXcerpt is used to provide an intuitive interface
to both data and queries. The second part of the demonstration shows
recent advancements in Xcerptâs implementation focusing on experimental
evaluation of recent complexity results and optimization techniques, as
well as scalability over a number of usage scenarios and input sizes
- âŠ