Search CORE

17,045 research outputs found

Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

Author: Begoli Edmon
Hyde Julian
Lemire Daniel
Mior Michael J.
Rodríguez Jesús Camacho
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2018
Field of study

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.Comment: SIGMOD'1

arXiv.org e-Print Archive

R-libre

Crossref

Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.

Author: Bichutskiy Vadim Y
Brachmann Rainer K
Colman Richard
Lathrop Richard H
Publication venue: eScholarship, University of California
Publication date: 01/01/2006
Field of study

Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)

Directory of Open Access Journals

eScholarship - University of California

What factors influence the design of a linked data generation algorithm?

Author: De Meester Ben
Dimou Anastasia
Heyvaert Pieter
Verborgh Ruben
Publication venue
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography