Search CORE

49,718 research outputs found

Generic Schema Matching with Cupid

Author: Bernstein Philip A.
Madhavan Jayant
Rahm Erhard
Publication venue
Publication date: 05/02/2019
Field of study

Schema matching is a critical step in many applications, such as XML message mapping, data warehouse loading, and schema integration. In this paper, we investigate algorithms for generic schema matching, outside of any particular data model or application. We first present a taxonomy for past solutions, showing that a rich range of techniques is available. We then propose a new algorithm, Cupid, that discovers mappings between schema elements based on their names, data types, constraints, and schema structure, using a broader set of techniques than past approaches. Some of our innovations are the integrated use of linguistic and structural matching, context-dependent matching of shared types, and a bias toward leaf structure where much of the schema content resides. After describing our algorithm, we present experimental results that compare Cupid to two other schema matching systems

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Qucosa - Publikationsserver der Universität Leipzig

Survey: Models and Prototypes of Schema Matching

Author: Mustofa Khabib
Sutanta Edhy
Wardoyo Retantyo
Winarko Edi
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/06/2016
Field of study

Schema matching is critical problem within many applications to integration of data/information, to achieve interoperability, and other cases caused by schematic heterogeneity. Schema matching evolved from manual way on a specific domain, leading to a new models and methods that are semi-automatic and more general, so it is able to effectively direct the user within generate a mapping among elements of two the schema or ontologies better. This paper is a summary of literature review on models and prototypes on schema matching within the last 25 years to describe the progress of and research chalenge and opportunities on a new models, methods, and/or prototypes

Institute of Advanced Engineering and Science

Integration of ontology data through learning instance matching

Author: Lu J
Wang C
Zhang G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Information integration with the aid of ontology can roughly be divided into two levels: schema level and data level. Most research has been focused on the schema level, i.e., mapping/matching concepts and properties in different ontologies with each other. However, the data level integration is equally important, especially in the decentralized Semantic Web environment. Noticing that ontology data (in the form of instances of concepts) from different sources often have different perspectives and may overlap with each other, we develop a matching method that utilizes the features of ontology and employs the machine learning approach to integrate those instances. By exploring ontology features, this method performs better than other general methods, which is revealed in our experiments. Through the process that implements the matching method, ontology data can be integrated together to offer more sophisticated services. © 2006 IEEE

OPUS - University of Technology Sydney

Visualization of heterogeneous data

Author: Alon Halevy
Bryan Chan
Jeff Klingner
Justin Talbot
Mike Cammarano
Pat Hanrahan
Xin (luna Dong
Publication venue: Student
Publication date: 01/01/2007
Field of study

Abstract — Both the Resource Description Framework (RDF), used in the semantic web, and Maya Viz u-forms represent data as a graph of objects connected by labeled edges. Existing systems for flexible visualization of this kind of data require manual specification of the possible visualization roles for each data attribute. When the schema is large and unfamiliar, this requirement inhibits exploratory visualization by requiring a costly up-front data integration step. To eliminate this step, we propose an automatic technique for mapping data attributes to visualization attributes. We formulate this as a schema matching problem, finding appropriate paths in the data model for each required visualization attribute in a visualization template. Index Terms—Data integration, RDF, attribute inference.

CiteSeerX

Explain3D: Explaining Disagreements in Disjoint Datasets

Author: Wang Xiaolan
Meliou Alexandra
Publication venue
Publication date: 24/02/1911
Field of study

Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

arXiv.org e-Print Archive

Trinity College

Dealing with Uncertainty in Lexical Annotation

Author: Bergamaschi Sonia
Corni Alberto
Po Laura
Sorrentino Serena
Publication venue: Instituto de Informática - Universidade Federal do Rio Grande do Sul
Publication date: 01/01/2009
Field of study

We present ALA, a tool for the automatic lexical annotation (i.e.annotation w.r.t. a thesaurus/lexical resource) of structured and semi-structured data sources and the discovery of probabilistic lexical relationships in a data integration environment. ALA performs automatic lexical annotation through the use of probabilistic annotations, i.e. an annotation is associated to a probability value. By performing probabilistic lexical annotation, we discover probabilistic inter-sources lexical relationships among schema elements. ALA extends the lexical annotation module of the MOMIS data integration system. However, it may be applied in general in the context of schema mapping discovery, ontology merging and data integration system and it is particularly suitable for performing “on-the-fly” data integration or probabilistic ontology matching

CiteSeerX

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia