Search CORE

982 research outputs found

Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

Author: Ezatpoor Payam
Publication venue: Digital Scholarship@UNLV
Publication date: 01/05/2017
Field of study

Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

University of Nevada, Las Vegas Repository

A model for computing skyline data items in cloud incomplete databases

Author: Abualkishik Abedallah Zaid
Aljuboori Ali A.Alwan
Gulzar Yonis
Mehmood Abid
Publication venue: 'Elsevier BV'
Publication date: 06/04/2020
Field of study

Skyline queries intend to retrieve the most superior data items in the database that best fit with the user’s given preference. However, processing skyline queries are expensive and uneasy when applying on large distributed databases such as cloud databases. Moreover, it would be further sophisticated to process skyline queries if these distributed databases have missing values in certain dimensions. The effect of data incompleteness on skyline process is extremely severe because missing values result in un-hold the transitivity property of skyline technique and leads to the problem of cyclic dominance. This paper proposes an efficient model for computing skyline data items in cloud incomplete databases. The model focuses on processing skyline queries in cloud incomplete databases aiming at reducing the domination tests between data items, the processing time, and the amount of data transfer among the involved datacenters. Various set of experiments are conducted over two different types of datasets and the result demonstrates that the proposed solution outperforms the previous approaches in terms of domination tests, processing time, and amount of data transferred

The International Islamic University Malaysia Repository

Skyline queries over incomplete multidimensional database

Author: Alwan Ali A.
Ibrahim Hamidah
Sidi Fatimah
Udzir Nur Izura
Publication venue
Publication date: 01/01/2011
Field of study

In recent years, there has been much focus on skyline queries that incorporate and provide more flexible query operators that return data items which are dominating other data items in all attributes (dimensions).Several techniques for skyline have been proposed in the literature.Most of the existing skyline techniques aimed to find the skyline query results by supposing that the values of dimensions are always present for every data item.In this paper we aim to evaluate the skyline preference queries in which some dimension values are missing.We proposed an approach for answering preference queries in a database by utilizing the concept of skyline technique.The skyline set selected for a given query operation is then optimized so that the missing values are replaced with some approximate values that provide a skyline answer with complete data.This will significantly reduce the number of comparisons between data items.Beside that, the number of retrieved skyline data items is reduced and this guides the users to select the most appropriate data items from the several alternative complete skyline data items

UUM Repository

Universiti Putra Malaysia Institutional Repository

Missing values estimation for skylines in incomplete database

Author: Aljuboori Ali A.Alwan
Ibrahim Hamidah
Sidi Fatimah
Udzir Nur Izura
Publication venue: Zarqa University, Jordan
Publication date: 01/01/2018
Field of study

Incompleteness of data is a common problem in many databases including web heterogeneous databases, multi-relational databases, spatial and temporal databases and data integration. The incompleteness of data introduces challenges in processing queries as providing accurate results that best meet the query conditions over incomplete database is not a trivial task. Several techniques have been proposed to process queries in incomplete database. Some of these techniques retrieve the query results based on the existing values rather than estimating the missing values. Such techniques are undesirable in many cases as the dimensions with missing values might be the important dimensions of the user’s query. Besides, the output is incomplete and might not satisfy the user preferences. In this paper we propose an approach that estimates missing values in skylines to guide users in selecting the most appropriate skylines from the several candidate skylines. The approach utilizes the concept of mining attribute correlations to generate an Approximate Functional Dependencies (AFDs) that captured the relationships between the dimensions. Besides, identifying the strength of probability correlations to estimate the values. Then, the skylines with estimated values are ranked. By doing so, we ensure that the retrieved skylines are in the order of their estimated precision

The International Islamic University Malaysia Repository

Skyline queries computation on crowdsourced- enabled incomplete database

Author: Aljuboori Ali A.Alwan
Gulzar Yonis
Ibrahim Hamidah
Swidan Marwa
Turaev Sherzod
Zaid Abualkishik Abedallah
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Data incompleteness becomes a frequent phenomenon in a large number of contemporary database applications such as web autonomous databases, big data, and crowd-sourced databases. Processing skyline queries over incomplete databases impose a number of challenges that negatively influence processing the skyline queries. Most importantly, the skylines derived from incomplete databases are also incomplete in which some values are missing. Retrieving skylines with missing values is undesirable, particularly, for recommendation and decision-making systems. Furthermore, running skyline queries on a database with incomplete data raises a number of issues influence processing skyline queries such as losing the transitivity property of the skyline technique and cyclic dominance between the tuples. The issue of estimating the missing values of skylines has been discussed and examined in the database literature. Most recently, several studies have suggested exploiting the crowd-sourced databases in order to estimate the missing values by generating plausible values using the crowd. Crowd-sourced databases have proved to be a powerful solution to perform user-given tasks by integrating human intelligence and experience to process the tasks. However, task processing using crowd-sourced incurs additional monetary cost and increases the time latency. Also, it is not always possible to produce a satisfactory result that meets the user's preferences. This paper proposes an approach for estimating the missing values of the skylines by first exploiting the available data and utilizes the implicit relationships between the attributes in order to impute the missing values of the skylines. This process aims at reducing the number of values to be estimated using the crowd when local estimation is inappropriate. Intensive experiments on both synthetic and real datasets have been accomplished. The experimental results have proven that the proposed approach for estimating the missing values of the skylines over crowd-sourced enabled incomplete databases is scalable and outperforms the other existing approaches

Universiti Putra Malaysia Institutional Repository

The International Islamic University Malaysia Repository

Policy-Aware Unbiased Learning to Rank for Top-k Rankings

Author: de Rijke Maarten
Oosterhuis Harrie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.Comment: SIGIR 2020 full conference pape

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Deriving skyline points over dynamic and incomplete databases

Author: Alwan Ali Amer
Babanejad Ghazaleh
Ibrahim Hamidah
Sidi Fatimah
Udzir NurI zura
Publication venue
Publication date: 01/01/2017
Field of study

The rapid growth of data is inevitable, and retrieving the best results that meet the user’s preferences is essential.To achieve this, skylines were introduced in which data items that are not dominated by the other data items in the database are retrieved as results (skylines).In most of the existing skyline approaches, the databases are assumed to be static and complete.However, in real world scenario, databases are not complete especially in multidimensional databases in which some dimensions may have missing values.The databases might also be dynamic in which new data items are inserted while existing data items are deleted or updated.Blindly performing pairwise comparisons on the whole data items after the changes are made is inappropriate as not all data items need to be compared in identifying the skylines. Thus, a novel skyline algorithm, DInSkyline, is proposed in this study which finds the most relevant data items in dynamic and incomplete databases. Several experiments have been conducted and the results show that DInSkyline outperforms the previous works by reducing the number of pairwise comparisons in the range of 52% to 73%

UUM Repository

Universiti Putra Malaysia Institutional Repository

The International Islamic University Malaysia Repository