Search CORE

870 research outputs found

Data Warehousing Scenarios for Model Management

Author: Bernstein Philip A.
Rahm Erhard
Publication venue
Publication date: 07/11/2018
Field of study

Model management is a framework for supporting meta-data related applications where models and mappings are manipulated as first class objects using operations such as Match, Merge, ApplyFunction, and Compose. To demonstrate the approach, we show how to use model management in two scenarios related to loading data warehouses. The case study illustrates the value of model management as a methodology for approaching meta-data related problems. It also helps clarify the required semantics of key operations. These detailed scenarios provide evidence that generic model management is useful and, very likely, implementable

Qucosa - Publikationsserver der Universität Leipzig

An Online Bibliography on Scheme Evolution

Author: Bernstein Philip A.
Rahm Erhard
Publication venue
Publication date: 19/10/2018
Field of study

We briefly motivate and present a new online bibliography on schema evolution, an area which has recently gained much interest in both research and practice

Qucosa - Publikationsserver der Universität Leipzig

Generic Schema Matching with Cupid

Author: Bernstein Philip A.
Madhavan Jayant
Rahm Erhard
Publication venue
Publication date: 05/02/2019
Field of study

Schema matching is a critical step in many applications, such as XML message mapping, data warehouse loading, and schema integration. In this paper, we investigate algorithms for generic schema matching, outside of any particular data model or application. We first present a taxonomy for past solutions, showing that a rich range of techniques is available. We then propose a new algorithm, Cupid, that discovers mappings between schema elements based on their names, data types, constraints, and schema structure, using a broader set of techniques than past approaches. Some of our innovations are the integrated use of linguistic and structural matching, context-dependent matching of shared types, and a bias toward leaf structure where much of the schema content resides. After describing our algorithm, we present experimental results that compare Cupid to two other schema matching systems

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Qucosa - Publikationsserver der Universität Leipzig

A survey of approaches to automatic schema matching

Author: Bernstein Philip A.
Rahm Erhard
Publication venue
Publication date: 19/10/2018
Field of study

Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component

Qucosa - Publikationsserver der Universität Leipzig

SHACL constraint validation during SPARQL query processing

Author: Bernstein Philip A.
Rabl Tilmann
Rohde Philipp D.
Vidal Maria-Esther
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 01/01/2021
Field of study

The importance of knowledge graphs is increasing. Due to their application in more and more real-world use-cases the data quality issue has to be addressed. The Shapes Constraint Language (SHACL) is the W3C recommendation language for defining integrity constraints over knowledge graphs expressed in the Resource Description Framework (RDF). Annotating SPARQL query results with metadata from the SHACL validation provides a better understanding of the knowledge graph and its data quality. We propose a query engine that is able to efficiently evaluate which instances in the knowledge graph fulfill the requirements from the SHACL shape schema and annotate the SPARQL query result with this metadata. Hence, adding the dimension of explainability to SPARQL query processing. Our preliminary analysis shows that the proposed optimizations performed for SHACL validation during SPARQL query processing increase the performance compared to a naive approach. However, in some queries the naive approach outperforms the optimizations. This shows that more work needs to be done in this topic to fully comprehend all impacting factors and to identify the amount of overhead added to the query execution

Institutionelles Repositorium der Leibniz Universität Hannover

Concept Expansion Using Web Tables

Author: Chi Wang
Kaushik Chakrabarti
Kris Ganjam
Philip A Bernstein
Yeye He
Zhimin Chen
Publication venue
Publication date: 11/04/2020
Field of study

ABSTRACT We study the following problem: given the name of an ad-hoc concept as well as a few seed entities belonging to the concept, output all entities belonging to it. Since producing the exact set of entities is hard, we focus on returning a ranked list of entities. Previous approaches either use seed entities as the only input, or inherently require negative examples. They suffer from input ambiguity and semantic drift, or are not viable options for ad-hoc tail concepts. In this paper, we propose to leverage the millions of tables on the web for this problem. The core technical challenge is to identify the "exclusive" tables for a concept to prevent semantic drift; existing holistic ranking techniques like personalized PageRank are inadequate for this purpose. We develop novel probabilistic ranking methods that can model a new type of table-entity relationship. Experiments with real-life concepts show that our proposed solution is significantly more effective than applying state-of-the-art set expansion or holistic ranking techniques

CiteSeerX

Concept Expansion Using Web Tables

Author: Chi Wang
Kaushik Chakrabarti
Kris Ganjam
Philip A. Bernstein
Yeye He
Zhimin Chen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/11/2015
Field of study

We study the following problem: given the name of an ad-hoc con-cept as well as a few seed entities belonging to the concept, output all entities belonging to it. Since producing the exact set of entities is hard, we focus on returning a ranked list of entities. Previous approaches either use seed entities as the only input, or inherently require negative examples. They suffer from input ambiguity and semantic drift, or are not viable options for ad-hoc tail concepts. In this paper, we propose to leverage the millions of tables on the web for this problem. The core technical challenge is to identify the “exclusive ” tables for a concept to prevent semantic drift; ex-isting holistic ranking techniques like personalized PageRank are inadequate for this purpose. We develop novel probabilistic rank-ing methods that can model a new type of table-entity relationship. Experiments with real-life concepts show that our proposed solu-tion is significantly more effective than applying state-of-the-art set expansion or holistic ranking techniques

CiteSeerX

Crossref