4,904 research outputs found
Validation of mappings between schemas
Mappings between schemas are key elements in several contexts such as data exchange, data integration, peer data management systems, etc. In all these contexts, the process of designing a mapping requires the participation of a mapping designer that needs a way to validate the mapping being defined, i.e., to check whether the mapping is in fact what the designer intended. However, to date very little work has directly focused on the effective validation of schema mappings. In this paper, we present a new approach for validating schema mappings that allows the mapping designer to ask questions about the accomplishment of certain desirable properties of these mappings. We consider four properties of mappings: mapping satisfiability, mapping inference, query answerability and mapping losslessness. We reformulate these properties in terms of the problem of checking the liveliness of a derived predicate. We emphasize that this approach is independent of any particular method for liveliness checking and, to show the feasibility of our approach, we use an implementation of the CQC Method and provide some experimental results.Postprint (published version
Validation of schema mappings with nested queries
With the emergence of the Web and the wide use of XML for representing data, the ability to map not only flat relational but also nested data has become crucial. The design of schema mappings is a semi-automatic process. A human designer is needed to guide the process, choose among mapping candidates, and successively refine the mapping. The designer needs a way to figure out whether the mapping is what was intended. Our approach to mapping validation allows the designer to check whether the mapping satisfies certain desirable properties. In this paper, we focus on the validation of mappings between nested relational schemas, in which the mapping assertions are either inclusions or equalities of nested queries. We focus on the nested relational setting since most XML’s Document Type Definitions (DTDs) can be represented in this model. We perform the validation by reasoning on the schemas and mapping definition. We take into account the integrity constraints defined on both the source and target schema.Preprin
UK utility data integration: overcoming schematic heterogeneity
In this paper we discuss syntactic, semantic and schematic issues which inhibit the integration of utility data in the UK. We then focus on the techniques employed within the VISTA project to overcome schematic heterogeneity. A Global
Schema based architecture is employed. Although automated approaches to Global Schema definition were attempted
the heterogeneities of the sector were too great. A manual approach to Global Schema definition was employed. The
techniques used to define and subsequently map source utility data models to this schema are discussed in detail. In order to ensure a coherent integrated model, sub and cross domain validation issues are then highlighted. Finally the proposed framework and data flow for schematic integration is introduced
Constraint-based Query Distribution Framework for an Integrated Global Schema
Distributed heterogeneous data sources need to be queried uniformly using
global schema. Query on global schema is reformulated so that it can be
executed on local data sources. Constraints in global schema and mappings are
used for source selection, query optimization,and querying partitioned and
replicated data sources. The provided system is all XML-based which poses query
in XML form, transforms, and integrates local results in an XML document.
Contributions include the use of constraints in our existing global schema
which help in source selection and query optimization, and a global query
distribution framework for querying distributed heterogeneous data sources.Comment: The Proceedings of the 13th INMIC 2009), Dec. 14-15, 2009, Islamabad,
Pakistan. Pages 1 - 6 Print ISBN: 978-1-4244-4872-2 INSPEC Accession Number:
11072575 Date of Current Version : 15 January 201
A framework for utility data integration in the UK
In this paper we investigate various factors which prevent utility knowledge from being
fully exploited and suggest that integration techniques can be applied to improve the
quality of utility records. The paper suggests a framework which supports knowledge
and data integration. The framework supports utility integration at two levels: the
schema and data level. Schema level integration ensures that a single, integrated geospatial
data set is available for utility enquiries. Data level integration improves utility data
quality by reducing inconsistency, duplication and conflicts. Moreover, the framework
is designed to preserve autonomy and distribution of utility data. The ultimate aim of
the research is to produce an integrated representation of underground utility infrastructure
in order to gain more accurate knowledge of the buried services. It is hoped that
this approach will enable us to understand various problems associated with utility data,
and to suggest some potential techniques for resolving them
Save up to 99% of your time in mapping validation
Identifying semantic correspondences between different vocabularies has been recognized as a fundamental step towards achieving interoperability. Several manual and automatic techniques have been recently proposed. Fully manual approaches are very precise, but extremely costly. Conversely, automatic approaches tend to fail when domain specific background knowledge is needed. Consequently, they typically require a manual validation step. Yet, when the number of computed correspondences is very large, the validation phase can be very expensive. In order to reduce the problems above, we propose to compute the minimal set of correspondences, that we call the minimal mapping, which are sufficient to compute all the other ones. We show that by concentrating on such correspondences we can save up to 99% of the manual checks required for validation
Assessing and refining mappings to RDF to improve dataset quality
RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually -but rarely- applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from (semi-) structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate (i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and (ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to diverse cases, e.g., large, crowdsourced datasets such as DBpedia, or newly generated, such as iLastic. Our evaluation indicates the efficiency of our workflow, as it significantly improves the overall quality of an RDF dataset in the observed cases
Using Element Clustering to Increase the Efficiency of XML Schema Matching
Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research
A schema-only approach to validate XML schema mappings
Since the emergence of the Web, the ability to map XML data between different data sources has become crucial. Defining a mapping is
however not a fully automatic process. The designer needs to figure out whether the mapping is what was intended. Our approach to this
validation consists of defining and checking certain desirable properties of mappings. We translate the XML schemas and the mapping into
first-order logic formalism and apply a reasoning mechanism to check the desirable properties automatically, without assuming any
particular instantiation of the schemas.Preprin
- …