19,213 research outputs found

    Dealing with uncertain entities in ontology alignment using rough sets

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision

    Keeping the data lake in form: proximity mining for pre-filtering schema matching

    Get PDF
    Data Lakes (DLs) are large repositories of raw datasets from disparate sources. As more datasets are ingested into a DL, there is an increasing need for efficient techniques to profile them and to detect the relationships among their schemata, commonly known as holistic schema matching. Schema matching detects similarity between the information stored in the datasets to support information discovery and retrieval. Currently, this is computationally expensive with the volume of state-of-the-art DLs. To handle this challenge, we propose a novel early-pruning approach to improve efficiency, where we collect different types of content metadata and schema metadata about the datasets, and then use this metadata in early-pruning steps to pre-filter the schema matching comparisons. This involves computing proximities between datasets based on their metadata, discovering their relationships based on overall proximities and proposing similar dataset pairs for schema matching. We improve the effectiveness of this task by introducing a supervised mining approach for effectively detecting similar datasets which are proposed for further schema matching. We conduct extensive experiments on a real-world DL which proves the success of our approach in effectively detecting similar datasets for schema matching, with recall rates of more than 85% and efficiency improvements above 70%. We empirically show the computational cost saving in space and time by applying our approach in comparison to instance-based schema matching techniques.This research was partially funded by the European Commission through the Erasmus Mundus Joint Doctorate (IT4BI-DC).Peer ReviewedPostprint (author's final draft

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

    Effective and Efficient Data Access in the Versatile Web Query Language Xcerpt

    Get PDF
    Access to Web data has become an integral part of many applications and services. In the past, such data has usually been accessed through human-tailoredHTMLinterfaces.Nowadays, rich client interfaces in desktop applications or, increasingly, in browser-based clients ease data access and allow more complex client processing based on XML or RDF data retrieved throughWeb service interfaces. Convenient specifications of the data processing on the client and flexible, expressive service interfaces for data access become essential in this context.Web query languages such as XQuery, XSLT, SPARQL, or Xcerpt have been tailored specifically for such a setting: declarative and efficient access and processing ofWeb data. Xcerpt stands apart among these languages by its versatility, i.e., its ability to access not just oneWeb format but many. In this demonstration, two aspects of Xcerpt are illustrated in detail: The first part of the demonstration focuses on Xcerpt’s pattern matching constructs and rules to enable effective and versatile data access. It uses a concrete practical use case from bibliography management to illustrate these language features. Xcerpt’s visual companion language visXcerpt is used to provide an intuitive interface to both data and queries. The second part of the demonstration shows recent advancements in Xcerpt’s implementation focusing on experimental evaluation of recent complexity results and optimization techniques, as well as scalability over a number of usage scenarios and input sizes
    • 

    corecore