Search CORE

1,044 research outputs found

Efficient skyline processing algorithm over dynamic and incomplete database

Author: Alwan Ali Amer
Babanejad Ghazaleh
Ibrahim Hamidah
Sidi Fatimah
Udzir Nur Izura
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/11/2018
Field of study

The notion of skyline processing is to discover the data items that are not dominated by any other data items. It is a well-known technique that is utilised to determine the best results that meet the user’s preferences. However, the rapid growth and frequent changes of data make the process of identifying skyline points no longer a trivial task. Most of the existing skyline approaches assume that the database is complete and static. However, in real world scenario, this assumption is not valid especially in multidimensional databases in which some dimensions have missing values while they are dynamic due to the continual modifications made towards them. Blindly examining the whole database after changes are made to identify the skyline points is inappropriate as not all data items are affected by the changes. Hence, in this study we propose a skyline algorithm, DyIn-Skyline, which is capable of identifying skyline points over dynamic and incomplete databases, by exploiting only those data items that are affected by the changes. Several experiments have been conducted and the results show that our proposed algorithm outperforms the previous work by reducing the number of pairwise comparisons in the range of 50% to 73%

The International Islamic University Malaysia Repository

Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

Author: Ezatpoor Payam
Publication venue: Digital Scholarship@UNLV
Publication date: 01/05/2017
Field of study

Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

University of Nevada, Las Vegas Repository

ANSWERING WHY-NOT QUESTIONS ON REVERSE SKYLINE QUERIES OVER INCOMPLETE DATA

Author: Connery Tosca Yoel
Santoso Bagus Jati
Publication venue: 'Lembaga Penelitian dan Pengabdian kepada Masyarakat ITS'
Publication date: 12/03/2019
Field of study

Recently, the development of the query-based preferences has received considerable attention from researchers and data users. One of the most popular preference-based queries is the skyline query, which will give a subset of superior records that are not dominated by any other records. As the developed version of skyline queries, a reverse skyline query rise. This query aims to get information about the query points that make a data or record as the part of result of their skyline query. Furthermore, data-oriented IT development requires scientists to be able to process data in all conditions. In the real world, there exist incomplete multidimensional data, both because of damage, loss, and privacy. In order to increase the usability over a data set, this study will discuss one of the problems in processing reverse skyline queries over incomplete data, namely the "why-not" problem. The considered solution to this "why-not" problem is advice and steps so that a query point that does not initially consider an incomplete data, as a result, can later make the record or incomplete data as part of the results. In this study, there will be further discussion about the dominance relationship between incomplete data along with the solution of the problem. Moreover, some performance evaluations are conducted to measure the level of efficiency and effectiveness

JUTI: Jurnal Ilmiah Teknologi Informasi

Skyline queries computation on crowdsourced- enabled incomplete database

Author: Aljuboori Ali A.Alwan
Gulzar Yonis
Ibrahim Hamidah
Swidan Marwa
Turaev Sherzod
Zaid Abualkishik Abedallah
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Data incompleteness becomes a frequent phenomenon in a large number of contemporary database applications such as web autonomous databases, big data, and crowd-sourced databases. Processing skyline queries over incomplete databases impose a number of challenges that negatively influence processing the skyline queries. Most importantly, the skylines derived from incomplete databases are also incomplete in which some values are missing. Retrieving skylines with missing values is undesirable, particularly, for recommendation and decision-making systems. Furthermore, running skyline queries on a database with incomplete data raises a number of issues influence processing skyline queries such as losing the transitivity property of the skyline technique and cyclic dominance between the tuples. The issue of estimating the missing values of skylines has been discussed and examined in the database literature. Most recently, several studies have suggested exploiting the crowd-sourced databases in order to estimate the missing values by generating plausible values using the crowd. Crowd-sourced databases have proved to be a powerful solution to perform user-given tasks by integrating human intelligence and experience to process the tasks. However, task processing using crowd-sourced incurs additional monetary cost and increases the time latency. Also, it is not always possible to produce a satisfactory result that meets the user's preferences. This paper proposes an approach for estimating the missing values of the skylines by first exploiting the available data and utilizes the implicit relationships between the attributes in order to impute the missing values of the skylines. This process aims at reducing the number of values to be estimated using the crowd when local estimation is inappropriate. Intensive experiments on both synthetic and real datasets have been accomplished. The experimental results have proven that the proposed approach for estimating the missing values of the skylines over crowd-sourced enabled incomplete databases is scalable and outperforms the other existing approaches

Universiti Putra Malaysia Institutional Repository

The International Islamic University Malaysia Repository

On Processing Reverse k-skyband and Ranked Reverse Skyline Queries

Author: CHEN Gang
GAO Yunjun
LI Mou
LI Qing
LIU Qing
ZHENG Baihua
Publication venue: 'Elsevier BV'
Publication date: 01/02/2015
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Integration of Skyline Queries into Spark SQL

Author: Grasmann Lukas
Pichler Reinhard
Selzer Alexander
Publication venue
Publication date: 07/10/2022
Field of study

Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL

arXiv.org e-Print Archive