Search CORE

836 research outputs found

Integration of Skyline Queries into Spark SQL

Author: Grasmann Lukas
Pichler Reinhard
Selzer Alexander
Publication venue
Publication date: 07/10/2022
Field of study

Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL

arXiv.org e-Print Archive

A model for computing skyline data items in cloud incomplete databases

Author: Abualkishik Abedallah Zaid
Aljuboori Ali A.Alwan
Gulzar Yonis
Mehmood Abid
Publication venue: 'Elsevier BV'
Publication date: 06/04/2020
Field of study

Skyline queries intend to retrieve the most superior data items in the database that best fit with the user’s given preference. However, processing skyline queries are expensive and uneasy when applying on large distributed databases such as cloud databases. Moreover, it would be further sophisticated to process skyline queries if these distributed databases have missing values in certain dimensions. The effect of data incompleteness on skyline process is extremely severe because missing values result in un-hold the transitivity property of skyline technique and leads to the problem of cyclic dominance. This paper proposes an efficient model for computing skyline data items in cloud incomplete databases. The model focuses on processing skyline queries in cloud incomplete databases aiming at reducing the domination tests between data items, the processing time, and the amount of data transfer among the involved datacenters. Various set of experiments are conducted over two different types of datasets and the result demonstrates that the proposed solution outperforms the previous approaches in terms of domination tests, processing time, and amount of data transferred

The International Islamic University Malaysia Repository

I/O-Efficient Dynamic Planar Range Skyline Queries

Author: Kejlberg-Rasmussen Casper
Tsakalidis Konstantinos
Tsichlas Kostas
Publication venue
Publication date: 10/07/2012
Field of study

We present the first fully dynamic worst case I/O-efficient data structures that support planar orthogonal \textit{3-sided range skyline reporting queries} in \bigO (\log_{2B^\epsilon} n + \frac{t}{B^{1-\epsilon}}) I/Os and updates in \bigO (\log_{2B^\epsilon} n) I/Os, using \bigO (\frac{n}{B^{1-\epsilon}}) blocks of space, for

n

input planar points,

t

reported points, and parameter

0 \leq \epsilon \leq 1

. We obtain the result by extending Sundar's priority queues with attrition to support the operations \textsc{DeleteMin} and \textsc{CatenateAndAttrite} in \bigO (1) worst case I/Os, and in \bigO(1/B) amortized I/Os given that a constant number of blocks is already loaded in main memory. Finally, we show that any pointer-based static data structure that supports \textit{dominated maxima reporting queries}, namely the difficult special case of 4-sided skyline queries, in \bigO(\log^{\bigO(1)}n +t) worst case time must occupy

\Omega(n \frac{\log n}{\log \log n})

space, by adapting a similar lower bounding argument for planar 4-sided range reporting queries.Comment: Submitted to SODA 201

arXiv.org e-Print Archive

University of Liverpool Repository

ORTHOGONAL RANGE SKYLINE QUERIES

Author: Murembya Saano
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2022
Field of study

Given a set of points P, we often need to report the ones that lie within a certain query range Q. This is referred to as orthogonal range reporting. We can also go further, reporting only the dominant points within that query. In 2 dimensions, a point p1 = (x1, y1) dominates a point p2 = (x2, y2) iff x1 ≥ x2 and y1 \u3e y2 or x1 \u3e x2 and y1 ≥ y2. The set of all dominant points within a query range is called the skyline of that query. There are several different variants of skyline queries. For example, we can consider each point in P to be colored. Given a query range Q, can we efficiently count the number of points of each color in the skyline? In this thesis, we will present a new O( log n log log n + D log log n) method for doing so. The method is possible thanks to a new reduction from skyline queries to orthogonal range queries. We will also explore novel algorithms for answering skyline query variants in the I/O model of computation, making use of techniques such as Ganguly et al.’s [2] double-chaining method and Alstrup et al.’s [14] grid approach. By applying these existing techniques in new ways, we can not only derive our own efficient algorithms for skyline queries, but also explore potential avenues for future researc

Michigan Technological University

Skyline Query Processing in Sensor Network Based on Data Centric Storage

Author: Kwak Yunsik
Lee Seokhee
Song Seokil
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2011
Field of study

Data centric storages for sensor networks have been proposed to efficiently process multi-dimensional range queries as well as exact matches. Usually, a sensor network does not process only one type of the query, but processes various types of queries such as range queries, exact matches and skyline queries. Therefore, a sensor network based on a data centric storage for range queries and exact matches should process skyline queries efficiently. However, existing algorithms for skyline queries have not considered the features of data centric storages. Some of the data centric storages store similar data in sensor nodes that are placed on geographically similar locations. Consequently, all data are ordered in a sensor network. In this paper, we propose a new skyline query processing algorithm that exploits the above features of data centric storages

CiteSeerX

Directory of Open Access Journals

PubMed Central

Reverse Skyline Computation over Sliding Windows

Author: Guoren Wang
Junchang Xin
Mei Bai
Zhiqiong Wang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Reverse skyline queries have been used in many real-world applications such as business planning, market analysis, and environmental monitoring. In this paper, we investigated how to efficiently evaluate continuous reverse skyline queries over sliding windows. We first theoretically analyzed the inherent properties of reverse skyline on data streams and proposed a novel pruning technique to reduce the number of data points preserved for processing continuous reverse skyline queries. Then, an efficient approach, called Semidominance Based Reverse Skyline (SDRS), was proposed to process continuous reverse skyline queries. Moreover, an extension was also proposed to handle n-of-N and (n1,n2)-of-N reverse skyline queries. Our extensive experimental studies have demonstrated the efficiency as well as effectiveness of the proposed approach with various experimental settings

Crossref

Directory of Open Access Journals