18 research outputs found
Algorithms for Core Computation in Data Exchange
We describe the state of the art in the area of core computation for data exchange. Two main approaches are considered: post-processing core computation, applied to a canonical universal solution constructed by chasing a given schema mapping, and direct core computation, where the mapping is first rewritten in order to create core universal solutions by chasing it
Generating SPARQL Executable Mappings to Integrate Ontologies
Data translation is an integration task that aims at populat-
ing a target model with data of a source model by means of mappings.
Generating them automatically is appealing insofar it may reduce inte-
gration costs. Matching techniques automatically generate uninterpreted
mappings, a.k.a. correspondences, that must be interpreted to perform
the data translation task. Other techniques automatically generate ex-
ecutable mappings, which encode an interpretation of these correspon-
dences in a given query language. Unfortunately, current techniques to
automatically generate executable mappings are based on instance ex-
amples of the target model, which usually contains no data, or based on
nested relational models, which cannot be straightforwardly applied to
semantic-web ontologies. In this paper, we present a technique to auto-
matically generate SPARQL executable mappings between OWL ontolo-
gies. The original contributions of our technique are as follows: 1) it is not
based on instance examples but on restrictions and correspondences, 2)
we have devised an algorithm to make restrictions and correspondences
explicit over a number of language-independent executable mappings,
and 3) we have devised an algorithm to transform language-independent
into SPARQL executable mappings. Finally, we evaluate our technique
over ten scenarios and check that the interpretation of correspondences
that it assumes is coherent with the expected results.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08-TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-
Mapping RDF knowledge bases using exchange samples
Nowadays, the Web of Data is in its earliest stages; it is currently organised into a variety of linked knowledge bases that have been developed independently by different organisations. RDF is one of the most popular languages to represent data in this context, which motivates the need to perform complex integration tasks amongst RDF knowledge bases. These tasks are performed using schema mappings, which are declarative specifications of the relationships amongst a source and a target knowledge base. Generating schema mappings automatically is appealing because this relieves users from the burden of handcrafting them. In the literature, the vast majority of proposals are based on the data models of the knowledge bases to be integrated, that is, on classes, properties, and constraints. In the Web of Data, there exist many data models that comprise very few constraints or no constraints at all, which has motivated some researchers to work on an alternate paradigm that does not rely on constraints. Unfortunately, the current proposals that fit this paradigm are not completely automatic. In this article, we present our proposal to automatically generate schema mappings amongst RDF knowledge bases. Its salient features are that it uses a single input exchange sample and a set of input correspondences, but does not require any constraints to be available or any user intervention; it has been validated and evaluated using many experiments that prove that it is effective and efficient in practice; the schema mappings that it produces are GLAV. Other researchers can reproduce our experiments since all of our implementations and repositories are publicly available
Dataset Discovery in Data Lakes
Data analytics stands to benefit from the increasing availability of datasets
that are held without their conceptual relationships being explicitly known.
When collected, these datasets form a data lake from which, by processes like
data wrangling, specific target datasets can be constructed that enable
value-adding analytics. Given the potential vastness of such data lakes, the
issue arises of how to pull out of the lake those datasets that might
contribute to wrangling out a given target. We refer to this as the problem of
dataset discovery in data lakes and this paper contributes an effective and
efficient solution to it. Our approach uses features of the values in a dataset
to construct hash-based indexes that map those features into a uniform distance
space. This makes it possible to define similarity distances between features
and to take those distances as measurements of relatedness w.r.t. a target
table. Given the latter (and exemplar tuples), our approach returns the most
related tables in the lake. We provide a detailed description of the approach
and report on empirical results for two forms of relatedness (unionability and
joinability) comparing them with prior work, where pertinent, and showing
significant improvements in all of precision, recall, target coverage, indexing
and discovery times
MostoDE: A tool to exchange data amongst semantic-web ontologies
A semantic-web ontology, simply known as ontology, comprises a data model and data that should comply with it. Due to their
distributed nature, there exist a large amount of heterogeneous ontologies, and a strong need for exchanging data amongst them, i.e.,
populating a target ontology using data that come from one or more source ontologies. Data exchange may be implemented using
correspondences that are later transformed into executable mappings; however, exchanging data amongst ontologies is not a trivial
task, so tools that help software engineers to exchange data amongst ontologies are a must. In the literature, there are a number of
tools to automatically generate executable mappings; unfortunately, they have some drawbacks, namely: 1) they were designed to
work with nested-relational data models, which prevents them to be applied to ontologies; 2) they require their users to handcraft
and maintain their executable mappings, which is not appealing; or 3) they do not attempt to identify groups of correspondences,
which may easily lead to incoherent target data. In this article, we present MostoDE, a tool that assists software engineers in
generating SPARQL executable mappings and exchanging data amongst ontologies. The salient features of our tool are as follows:
it allows to automate the generation of executable mappings using correspondences and constraints; it integrates several systems
that implement semantic-web technologies to exchange data; and it provides visual aids for helping software engineers to exchange
data amongst ontologies.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08-TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Ciencia e Innovación TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-EMinisterio de Economía y Competitividad TIN2011-15497-
Similarity Measures For Incomplete Database Instances
The problem of comparing database instances with incompleteness is prevalent in applications such as analyzing how a dataset has evolved over time (e.g., data versioning), evaluating data cleaning solutions (e.g., compare an instance produced by a data repair algorithm against a gold standard), or comparing solutions generated by data exchange systems (e.g., universal vs core solutions). In this work, we propose a framework for computing similarity of instances with labeled nulls, even of those without primary keys. As a side-effect, the similarity score computation returns a mapping between the instances’ tuples, which explains the score. We demonstrate that computing the similarity of two incomplete instances is NP-hard in the instance size in general. To be able to compare instances of realistic size, we present an approximate PTIME algorithm for instance comparison. Experimental results demonstrate that the approximate algorithm is up to three orders of magnitude faster than an exact algorithm for the computation of the similarity score, while the difference between approximate and exact scores is always smaller than 1%
What is the IQ of your data transformation system?
Mapping and translating data across different representations is a crucial problem in information systems. Many formalisms and tools are currently used for this purpose, to the point that devel- opers typically face a difficult question: “what is the right tool for my translation task?” In this paper, we introduce several techniques that contribute to answer this question. Among these, a fairly gen- eral definition of a data transformation system, a new and very effi- cient similarity measure to evaluate the outputs produced by such a system, and a metric to estimate user efforts. Based on these tech- niques, we are able to compare a wide range of systems on many translation tasks, to gain interesting insights about their effective- ness, and, ultimately, about their “intelligence”