10 research outputs found

    A model for skyline query processing in a partially complete database

    Get PDF
    In the recent years, skyline queries become one of the predominant and most frequently used queries among preference queries in the database system. Its main theme is to identify and return those data items that are not dominated by any other data item in the database. In the past decade, a tremendous number of research have been conducted emphasized on skyline queries by proposing many variations of skyline techniques for a different type of database. Most of these techniques claimed that a database has complete data and values are always present when process skyline queries. However, this is not necessary to be always the case, particularly for large databases with a high number of dimensions as some values may be missing. Thus, existing techniques cannot be easily tailored to derive skylines in a database with missing values. Two significant issues might be raised, the issue of losing transitivity property which thus leads to the issue of cyclic dominance. Finding skylines in a database with partially complete data has not received enough attention. This paper proposes an efficient model to identify skylines over a database with partial complete data. Experimental results on various types of datasets demonstrate that the proposed approach outperforms the previous approach in terms of the number of pairwise comparisons

    Missing values estimation for skylines in incomplete database

    Get PDF
    Incompleteness of data is a common problem in many databases including web heterogeneous databases, multi-relational databases, spatial and temporal databases and data integration. The incompleteness of data introduces challenges in processing queries as providing accurate results that best meet the query conditions over incomplete database is not a trivial task. Several techniques have been proposed to process queries in incomplete database. Some of these techniques retrieve the query results based on the existing values rather than estimating the missing values. Such techniques are undesirable in many cases as the dimensions with missing values might be the important dimensions of the user’s query. Besides, the output is incomplete and might not satisfy the user preferences. In this paper we propose an approach that estimates missing values in skylines to guide users in selecting the most appropriate skylines from the several candidate skylines. The approach utilizes the concept of mining attribute correlations to generate an Approximate Functional Dependencies (AFDs) that captured the relationships between the dimensions. Besides, identifying the strength of probability correlations to estimate the values. Then, the skylines with estimated values are ranked. By doing so, we ensure that the retrieved skylines are in the order of their estimated precision

    ANSWERING WHY-NOT QUESTIONS ON REVERSE SKYLINE QUERIES OVER INCOMPLETE DATA

    Get PDF
            Recently, the development of the query-based preferences has received considerable attention from researchers and data users. One of the most popular preference-based queries is the skyline query, which will give a subset of superior records that are not dominated by any other records. As the developed version of skyline queries, a reverse skyline query rise. This query aims to get information about the query points that make a data or record as the part of result of their skyline query.     Furthermore, data-oriented IT development requires scientists to be able to process data in all conditions. In the real world, there exist incomplete multidimensional data, both because of damage, loss, and privacy. In order to increase the usability over a data set, this study will discuss one of the problems in processing reverse skyline queries over incomplete data, namely the "why-not" problem. The considered solution to this "why-not" problem is advice and steps so that a query point that does not initially consider an incomplete data, as a result, can later make the record or incomplete data as part of the results. In this study, there will be further discussion about the dominance relationship between incomplete data along with the solution of the problem. Moreover, some performance evaluations are conducted to measure the level of efficiency and effectiveness

    Optimizing skyline query processing in incomplete data

    Get PDF
    Given the significance of skyline queries, they are incorporated in various modern applications including personalized recommendation systems as well as decision-making and decision-support systems. Skyline queries are used to identify superior data items in the database. Most of the previously proposed skyline algorithms work on a complete database where the data are always present (non-missing). However, in many contemporary real-world databases, particularly those databases with large cardinality and high dimensionality, such assumption is not necessarily valid. Hence, missing data pose new challenges if the processing skyline queries cannot easily apply those methods that are designed for complete data. This is due to the fact that imperfect data cause the loss of the transitivity property of the skyline method and cyclic dominance. This paper presents a framework called Optimized Incomplete Skyline (OIS) which utilizes a technique that simplifies the skyline process on a database with missing data and helps prune the data items before performing the skyline process. The proposed strategy assures that the number of the domination tests is significantly reduced. A set of experiments has been accomplished using both real and synthetic datasets aimed at validating the performance of the framework. The experiment results confirm that the OIS framework is indeed superior and steadily outperforms the current approaches in terms of the number of domination tests required to retrieve the skylines

    A model for computing skyline data items in cloud incomplete databases

    Get PDF
    Skyline queries intend to retrieve the most superior data items in the database that best fit with the user’s given preference. However, processing skyline queries are expensive and uneasy when applying on large distributed databases such as cloud databases. Moreover, it would be further sophisticated to process skyline queries if these distributed databases have missing values in certain dimensions. The effect of data incompleteness on skyline process is extremely severe because missing values result in un-hold the transitivity property of skyline technique and leads to the problem of cyclic dominance. This paper proposes an efficient model for computing skyline data items in cloud incomplete databases. The model focuses on processing skyline queries in cloud incomplete databases aiming at reducing the domination tests between data items, the processing time, and the amount of data transfer among the involved datacenters. Various set of experiments are conducted over two different types of datasets and the result demonstrates that the proposed solution outperforms the previous approaches in terms of domination tests, processing time, and amount of data transferred

    Deriving skyline points over dynamic and incomplete databases

    Get PDF
    The rapid growth of data is inevitable, and retrieving the best results that meet the user’s preferences is essential.To achieve this, skylines were introduced in which data items that are not dominated by the other data items in the database are retrieved as results (skylines).In most of the existing skyline approaches, the databases are assumed to be static and complete.However, in real world scenario, databases are not complete especially in multidimensional databases in which some dimensions may have missing values.The databases might also be dynamic in which new data items are inserted while existing data items are deleted or updated.Blindly performing pairwise comparisons on the whole data items after the changes are made is inappropriate as not all data items need to be compared in identifying the skylines. Thus, a novel skyline algorithm, DInSkyline, is proposed in this study which finds the most relevant data items in dynamic and incomplete databases. Several experiments have been conducted and the results show that DInSkyline outperforms the previous works by reducing the number of pairwise comparisons in the range of 52% to 73%

    Answering skyline queries over incomplete data with crowdsourcing (Extended Abstract)

    Get PDF

    Identifying skylines in cloud databases with incomplete data

    Get PDF
    Skyline queries is a rich area of research in the database community. Due to its great benefits, it has been integrated into many database applications including but not limited to personalized recommendation, multi-objective, decision support and decision-making systems. Many variations of skyline technique have been proposed in the literature addressing the issue of handling skyline queries in incomplete database. Nevertheless, these solutions are designed to fit with centralized incomplete database (single access). However, in many real-world database systems, this might not be the case, particularly for a database witha large amount of incomplete data distributed over various remote locations such as cloud databases. It is inadequate to directly apply skyline solutions designed for the centralized incomplete database to work on cloud due to the prohibitive cost. Thus, this paper introduces a new approach called Incomplete-data Cloud Skylines (ICS) aiming at processing skyline queries in cloud databases with incomplete data. This approach emphasizes on reducing the amount of data transfer and domination tests during skyline process. It incorporates sorting technique that assists in arranging the data items in a way where dominating data items will be placed at the top of the list helping in eliminate dominated data items. Besides, ICS also employs a filtering technique to prune the dominated data items before applying skyline technique. It comprises a technique named local skyline joiner that helps in reducing the amount of data transfer between datacenters when deriving the final skylines. It limit the amount of data items to be transferred to only those local skylines of each relation. A comprehensive experiment have been performed on both synthetic and real-life datasets, which demonstrate the effectiveness and versatility of our approach in comparison to the current existing approaches. We argue that our approach is practical and can be adopted in many contemporary cloud database systems with incomplete data to process skyline queries

    Skyline queries computation on crowdsourced- enabled incomplete database

    Get PDF
    Data incompleteness becomes a frequent phenomenon in a large number of contemporary database applications such as web autonomous databases, big data, and crowd-sourced databases. Processing skyline queries over incomplete databases impose a number of challenges that negatively influence processing the skyline queries. Most importantly, the skylines derived from incomplete databases are also incomplete in which some values are missing. Retrieving skylines with missing values is undesirable, particularly, for recommendation and decision-making systems. Furthermore, running skyline queries on a database with incomplete data raises a number of issues influence processing skyline queries such as losing the transitivity property of the skyline technique and cyclic dominance between the tuples. The issue of estimating the missing values of skylines has been discussed and examined in the database literature. Most recently, several studies have suggested exploiting the crowd-sourced databases in order to estimate the missing values by generating plausible values using the crowd. Crowd-sourced databases have proved to be a powerful solution to perform user-given tasks by integrating human intelligence and experience to process the tasks. However, task processing using crowd-sourced incurs additional monetary cost and increases the time latency. Also, it is not always possible to produce a satisfactory result that meets the user's preferences. This paper proposes an approach for estimating the missing values of the skylines by first exploiting the available data and utilizes the implicit relationships between the attributes in order to impute the missing values of the skylines. This process aims at reducing the number of values to be estimated using the crowd when local estimation is inappropriate. Intensive experiments on both synthetic and real datasets have been accomplished. The experimental results have proven that the proposed approach for estimating the missing values of the skylines over crowd-sourced enabled incomplete databases is scalable and outperforms the other existing approaches
    corecore