4,904 research outputs found

    Validation of mappings between schemas

    Get PDF
    Mappings between schemas are key elements in several contexts such as data exchange, data integration, peer data management systems, etc. In all these contexts, the process of designing a mapping requires the participation of a mapping designer that needs a way to validate the mapping being defined, i.e., to check whether the mapping is in fact what the designer intended. However, to date very little work has directly focused on the effective validation of schema mappings. In this paper, we present a new approach for validating schema mappings that allows the mapping designer to ask questions about the accomplishment of certain desirable properties of these mappings. We consider four properties of mappings: mapping satisfiability, mapping inference, query answerability and mapping losslessness. We reformulate these properties in terms of the problem of checking the liveliness of a derived predicate. We emphasize that this approach is independent of any particular method for liveliness checking and, to show the feasibility of our approach, we use an implementation of the CQC Method and provide some experimental results.Postprint (published version

    Validation of schema mappings with nested queries

    Get PDF
    With the emergence of the Web and the wide use of XML for representing data, the ability to map not only flat relational but also nested data has become crucial. The design of schema mappings is a semi-automatic process. A human designer is needed to guide the process, choose among mapping candidates, and successively refine the mapping. The designer needs a way to figure out whether the mapping is what was intended. Our approach to mapping validation allows the designer to check whether the mapping satisfies certain desirable properties. In this paper, we focus on the validation of mappings between nested relational schemas, in which the mapping assertions are either inclusions or equalities of nested queries. We focus on the nested relational setting since most XML’s Document Type Definitions (DTDs) can be represented in this model. We perform the validation by reasoning on the schemas and mapping definition. We take into account the integrity constraints defined on both the source and target schema.Preprin

    UK utility data integration: overcoming schematic heterogeneity

    Get PDF
    In this paper we discuss syntactic, semantic and schematic issues which inhibit the integration of utility data in the UK. We then focus on the techniques employed within the VISTA project to overcome schematic heterogeneity. A Global Schema based architecture is employed. Although automated approaches to Global Schema definition were attempted the heterogeneities of the sector were too great. A manual approach to Global Schema definition was employed. The techniques used to define and subsequently map source utility data models to this schema are discussed in detail. In order to ensure a coherent integrated model, sub and cross domain validation issues are then highlighted. Finally the proposed framework and data flow for schematic integration is introduced

    Constraint-based Query Distribution Framework for an Integrated Global Schema

    Full text link
    Distributed heterogeneous data sources need to be queried uniformly using global schema. Query on global schema is reformulated so that it can be executed on local data sources. Constraints in global schema and mappings are used for source selection, query optimization,and querying partitioned and replicated data sources. The provided system is all XML-based which poses query in XML form, transforms, and integrates local results in an XML document. Contributions include the use of constraints in our existing global schema which help in source selection and query optimization, and a global query distribution framework for querying distributed heterogeneous data sources.Comment: The Proceedings of the 13th INMIC 2009), Dec. 14-15, 2009, Islamabad, Pakistan. Pages 1 - 6 Print ISBN: 978-1-4244-4872-2 INSPEC Accession Number: 11072575 Date of Current Version : 15 January 201

    A framework for utility data integration in the UK

    Get PDF
    In this paper we investigate various factors which prevent utility knowledge from being fully exploited and suggest that integration techniques can be applied to improve the quality of utility records. The paper suggests a framework which supports knowledge and data integration. The framework supports utility integration at two levels: the schema and data level. Schema level integration ensures that a single, integrated geospatial data set is available for utility enquiries. Data level integration improves utility data quality by reducing inconsistency, duplication and conflicts. Moreover, the framework is designed to preserve autonomy and distribution of utility data. The ultimate aim of the research is to produce an integrated representation of underground utility infrastructure in order to gain more accurate knowledge of the buried services. It is hoped that this approach will enable us to understand various problems associated with utility data, and to suggest some potential techniques for resolving them

    Save up to 99% of your time in mapping validation

    Get PDF
    Identifying semantic correspondences between different vocabularies has been recognized as a fundamental step towards achieving interoperability. Several manual and automatic techniques have been recently proposed. Fully manual approaches are very precise, but extremely costly. Conversely, automatic approaches tend to fail when domain specific background knowledge is needed. Consequently, they typically require a manual validation step. Yet, when the number of computed correspondences is very large, the validation phase can be very expensive. In order to reduce the problems above, we propose to compute the minimal set of correspondences, that we call the minimal mapping, which are sufficient to compute all the other ones. We show that by concentrating on such correspondences we can save up to 99% of the manual checks required for validation

    Assessing and refining mappings to RDF to improve dataset quality

    Get PDF
    RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually -but rarely- applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from (semi-) structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as DBpedia, or newly generated, such as iLastic. Our evaluation indicates the efficiency of our workflow, as it significantly improves the overall quality of an RDF dataset in the observed cases

    Using Element Clustering to Increase the Efficiency of XML Schema Matching

    Get PDF
    Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research

    A schema-only approach to validate XML schema mappings

    Get PDF
    Since the emergence of the Web, the ability to map XML data between different data sources has become crucial. Defining a mapping is however not a fully automatic process. The designer needs to figure out whether the mapping is what was intended. Our approach to this validation consists of defining and checking certain desirable properties of mappings. We translate the XML schemas and the mapping into first-order logic formalism and apply a reasoning mechanism to check the desirable properties automatically, without assuming any particular instantiation of the schemas.Preprin
    • …
    corecore