Search CORE

2 research outputs found

Integration of Skyline Queries into Spark SQL

Author: Grasmann Lukas
Pichler Reinhard
Selzer Alexander
Publication venue
Publication date: 07/10/2022
Field of study

Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL

arXiv.org e-Print Archive

Efficient Skyline Computation in High-Dimensionality Domains

Author: Li Dominique H.
Liu Rui
Publication venue: HAL CCSD
Publication date: 30/03/2020
Field of study

International audienceWe present a dimension indexing based algorithm for skyline computation. We first show that the dominance tests required to determine a skyline tuple can be sufficiently bounded to a subset of the current skyline, and then propose the algorithm SDI, of which the time complexity is better than the best known algorithm in high-dimensionality domains with reasonably low cardinality. Our performance evaluation on synthetic and real datasets shows that SDI outperforms the state-of-the-art skyline algorithm in both low-dimensionality and high-dimensionality domains

HAL Université de Tours