5 research outputs found
Joining Entities Across Relation and Graph with a Unified Model
This paper introduces RG (Relational Genetic) model, a revised relational
model to represent graph-structured data in RDBMS while preserving its
topology, for efficiently and effectively extracting data in different formats
from disparate sources. Along with: (a) SQL, an SQL dialect augmented
with graph pattern queries and tuple-vertex joins, such that one can extract
graph properties via graph pattern matching, and "semantically" match entities
across relations and graphs; (b) a logical representation of graphs in RDBMS,
which introduces an exploration operator for efficient pattern querying,
supports also browsing and updating graph-structured data; and (c) a strategy
to uniformly evaluate SQL, pattern and hybrid queries that join tuples and
vertices, all inside an RDBMS by leveraging its optimizer without performance
degradation on switching different execution engines. A lightweight system,
WhiteDB, is developed as an implementation to evaluate the benefits it can
actually bring on real-life data. We empirically verified that the RG model
enables the graph pattern queries to be answered as efficiently as in native
graph engines; can consider the access on graph and relation in any order for
optimal plan; and supports effective data enrichment.Comment: 24 pages, 16 figures, 5 table
Cross-engine query execution in federated database systems
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 47-48).Duggan et al.have created a reference implementation of the BigDAWG system: a new architecture for future Big Data applications, guided by the philosophy that "one size does not fit all." Such applications not only call for large-scale analytics, but also for real-time streaming support, smaller analytics at interactive speeds, data visualization, and cross-storage-system queries. The importance and effectiveness of such a system has been demonstrated in a hospital application using data from an intensive care unit (ICU). In this report, we implement and evaluate a concrete version of a cross-system Query Executor and its interface with a cross-system Query Planner. In particular, we focus on cross-engine shuffle joins within the BigDAWG system.by Ankush M. Gupta.M. Eng