18 research outputs found

    Algorithms for Core Computation in Data Exchange

    Get PDF
    We describe the state of the art in the area of core computation for data exchange. Two main approaches are considered: post-processing core computation, applied to a canonical universal solution constructed by chasing a given schema mapping, and direct core computation, where the mapping is first rewritten in order to create core universal solutions by chasing it

    Generating SPARQL Executable Mappings to Integrate Ontologies

    Get PDF
    Data translation is an integration task that aims at populat- ing a target model with data of a source model by means of mappings. Generating them automatically is appealing insofar it may reduce inte- gration costs. Matching techniques automatically generate uninterpreted mappings, a.k.a. correspondences, that must be interpreted to perform the data translation task. Other techniques automatically generate ex- ecutable mappings, which encode an interpretation of these correspon- dences in a given query language. Unfortunately, current techniques to automatically generate executable mappings are based on instance ex- amples of the target model, which usually contains no data, or based on nested relational models, which cannot be straightforwardly applied to semantic-web ontologies. In this paper, we present a technique to auto- matically generate SPARQL executable mappings between OWL ontolo- gies. The original contributions of our technique are as follows: 1) it is not based on instance examples but on restrictions and correspondences, 2) we have devised an algorithm to make restrictions and correspondences explicit over a number of language-independent executable mappings, and 3) we have devised an algorithm to transform language-independent into SPARQL executable mappings. Finally, we evaluate our technique over ten scenarios and check that the interpretation of correspondences that it assumes is coherent with the expected results.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08-TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-

    Mapping RDF knowledge bases using exchange samples

    Get PDF
    Nowadays, the Web of Data is in its earliest stages; it is currently organised into a variety of linked knowledge bases that have been developed independently by different organisations. RDF is one of the most popular languages to represent data in this context, which motivates the need to perform complex integration tasks amongst RDF knowledge bases. These tasks are performed using schema mappings, which are declarative specifications of the relationships amongst a source and a target knowledge base. Generating schema mappings automatically is appealing because this relieves users from the burden of handcrafting them. In the literature, the vast majority of proposals are based on the data models of the knowledge bases to be integrated, that is, on classes, properties, and constraints. In the Web of Data, there exist many data models that comprise very few constraints or no constraints at all, which has motivated some researchers to work on an alternate paradigm that does not rely on constraints. Unfortunately, the current proposals that fit this paradigm are not completely automatic. In this article, we present our proposal to automatically generate schema mappings amongst RDF knowledge bases. Its salient features are that it uses a single input exchange sample and a set of input correspondences, but does not require any constraints to be available or any user intervention; it has been validated and evaluated using many experiments that prove that it is effective and efficient in practice; the schema mappings that it produces are GLAV. Other researchers can reproduce our experiments since all of our implementations and repositories are publicly available

    Dataset Discovery in Data Lakes

    Get PDF
    Data analytics stands to benefit from the increasing availability of datasets that are held without their conceptual relationships being explicitly known. When collected, these datasets form a data lake from which, by processes like data wrangling, specific target datasets can be constructed that enable value-adding analytics. Given the potential vastness of such data lakes, the issue arises of how to pull out of the lake those datasets that might contribute to wrangling out a given target. We refer to this as the problem of dataset discovery in data lakes and this paper contributes an effective and efficient solution to it. Our approach uses features of the values in a dataset to construct hash-based indexes that map those features into a uniform distance space. This makes it possible to define similarity distances between features and to take those distances as measurements of relatedness w.r.t. a target table. Given the latter (and exemplar tuples), our approach returns the most related tables in the lake. We provide a detailed description of the approach and report on empirical results for two forms of relatedness (unionability and joinability) comparing them with prior work, where pertinent, and showing significant improvements in all of precision, recall, target coverage, indexing and discovery times

    MostoDE: A tool to exchange data amongst semantic-web ontologies

    Get PDF
    A semantic-web ontology, simply known as ontology, comprises a data model and data that should comply with it. Due to their distributed nature, there exist a large amount of heterogeneous ontologies, and a strong need for exchanging data amongst them, i.e., populating a target ontology using data that come from one or more source ontologies. Data exchange may be implemented using correspondences that are later transformed into executable mappings; however, exchanging data amongst ontologies is not a trivial task, so tools that help software engineers to exchange data amongst ontologies are a must. In the literature, there are a number of tools to automatically generate executable mappings; unfortunately, they have some drawbacks, namely: 1) they were designed to work with nested-relational data models, which prevents them to be applied to ontologies; 2) they require their users to handcraft and maintain their executable mappings, which is not appealing; or 3) they do not attempt to identify groups of correspondences, which may easily lead to incoherent target data. In this article, we present MostoDE, a tool that assists software engineers in generating SPARQL executable mappings and exchanging data amongst ontologies. The salient features of our tool are as follows: it allows to automate the generation of executable mappings using correspondences and constraints; it integrates several systems that implement semantic-web technologies to exchange data; and it provides visual aids for helping software engineers to exchange data amongst ontologies.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08-TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Ciencia e Innovación TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-EMinisterio de Economía y Competitividad TIN2011-15497-

    Similarity Measures For Incomplete Database Instances

    Get PDF
    The problem of comparing database instances with incompleteness is prevalent in applications such as analyzing how a dataset has evolved over time (e.g., data versioning), evaluating data cleaning solutions (e.g., compare an instance produced by a data repair algorithm against a gold standard), or comparing solutions generated by data exchange systems (e.g., universal vs core solutions). In this work, we propose a framework for computing similarity of instances with labeled nulls, even of those without primary keys. As a side-effect, the similarity score computation returns a mapping between the instances’ tuples, which explains the score. We demonstrate that computing the similarity of two incomplete instances is NP-hard in the instance size in general. To be able to compare instances of realistic size, we present an approximate PTIME algorithm for instance comparison. Experimental results demonstrate that the approximate algorithm is up to three orders of magnitude faster than an exact algorithm for the computation of the similarity score, while the difference between approximate and exact scores is always smaller than 1%

    What is the IQ of your data transformation system?

    Full text link
    Mapping and translating data across different representations is a crucial problem in information systems. Many formalisms and tools are currently used for this purpose, to the point that devel- opers typically face a difficult question: “what is the right tool for my translation task?” In this paper, we introduce several techniques that contribute to answer this question. Among these, a fairly gen- eral definition of a data transformation system, a new and very effi- cient similarity measure to evaluate the outputs produced by such a system, and a metric to estimate user efforts. Based on these tech- niques, we are able to compare a wide range of systems on many translation tasks, to gain interesting insights about their effective- ness, and, ultimately, about their “intelligence”
    corecore