2,253 research outputs found

    Privacy Aware Parallel Computation of Skyline Sets Queries from Distributed Databases

    Get PDF
    A skyline query finds objects that are not dominated by another object from a given set of objects. Skyline queries help us to filter unnecessary information efficiently and provide us clues for various decision making tasks. However, we cannot use skyline queries in privacy aware environment, since we have to hide individual's records values even though there is no ID information. Therefore, we considered skyline sets queries. The skyline set query returns skyline sets from all possible sets, each of which is composed of some objects in a database. With the growth of network infrastructure data are stored in distributed databases. In this paper, we expand the idea to compute skyline sets queries in parallel fashion from distributed databases without disclosing individual records to others. The proposed method utilizes an agent-based parallel computing framework that can efficiently compute skyline sets queries and can solve the privacy problems of skyline queries in distributed environment. The computation of skyline sets is performed simultaneously in all databases which increases parallelism and reduces the computation time

    Query Profiler Versus Cache for Skyline Computation

    Get PDF
    A skyline query is multi preference user query which generates the best objects from a multi attributed dataset. Skyline computation in an optimum time becomes a real challenge when the number of user preference are large and size of the dataset is also huge. When such a big data gets queried at large, response time optimization is possible through maintenance of the metadata about the pre-executed skyline queries. We have earlier proposed, a novel structure namely �Query Profiler� which preserves such metadata about the historical queries, raised against a dataset. Also as the dataset gets queried at large, the dimensions of user queries often overlap and queries get correlated. Such correlations in user queries and the availability of metadata about the earlier queries, combined together speed up the computation time and the optimization of the response time of the further skyline computation becomes possible. In this paper, we assert the efficacy of the Query Profiler by comparing its performance with the parallel techniques which utilize cache mechanism for optimization of the response time. We also present the experimental results which assert the efficacy of the proposed technique

    Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

    Full text link
    Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

    I/O-Efficient Planar Range Skyline and Attrition Priority Queues

    Full text link
    In the planar range skyline reporting problem, we store a set P of n 2D points in a structure such that, given a query rectangle Q = [a_1, a_2] x [b_1, b_2], the maxima (a.k.a. skyline) of P \cap Q can be reported efficiently. The query is 3-sided if an edge of Q is grounded, giving rise to two variants: top-open (b_2 = \infty) and left-open (a_1 = -\infty) queries. All our results are in external memory under the O(n/B) space budget, for both the static and dynamic settings: * For static P, we give structures that answer top-open queries in O(log_B n + k/B), O(loglog_B U + k/B), and O(1 + k/B) I/Os when the universe is R^2, a U x U grid, and a rank space grid [O(n)]^2, respectively (where k is the number of reported points). The query complexity is optimal in all cases. * We show that the left-open case is harder, such that any linear-size structure must incur \Omega((n/B)^e + k/B) I/Os for a query. We show that this case is as difficult as the general 4-sided queries, for which we give a static structure with the optimal query cost O((n/B)^e + k/B). * We give a dynamic structure that supports top-open queries in O(log_2B^e (n/B) + k/B^1-e) I/Os, and updates in O(log_2B^e (n/B)) I/Os, for any e satisfying 0 \le e \le 1. This leads to a dynamic structure for 4-sided queries with optimal query cost O((n/B)^e + k/B), and amortized update cost O(log (n/B)). As a contribution of independent interest, we propose an I/O-efficient version of the fundamental structure priority queue with attrition (PQA). Our PQA supports FindMin, DeleteMin, and InsertAndAttrite all in O(1) worst case I/Os, and O(1/B) amortized I/Os per operation. We also add the new CatenateAndAttrite operation that catenates two PQAs in O(1) worst case and O(1/B) amortized I/Os. This operation is a non-trivial extension to the classic PQA of Sundar, even in internal memory.Comment: Appeared at PODS 2013, New York, 19 pages, 10 figures. arXiv admin note: text overlap with arXiv:1208.4511, arXiv:1207.234
    corecore