236 research outputs found

    Efficient Computation of Group Skyline Queries on MapReduce

    Get PDF
    Skyline query is one of the important issues indatabase research and has been applied in diverse applicationsincluding multi-criteria decision support systems and so on. Theresponse of a skyline query eliminates unnecessary tuples andreturns only the user-interested result. Traditional skyline querypicks out the outstanding tuples, based on one-to-one recordcomparisons. Some modern applications request, beyond thesingular ones, for superior combinations of records. For example,fantasy basketball is composed of 5 players, fantasy baseball of 9players, and a hackathon of several programmers. Group skylineaims at considering all the groups comprising several records,and finding out the non-dominated ones. Because of the highcomplexity, few studies have been conducted and none has beenpresented in either distributed or parallel computing. This paperis the first study that solves the group skyline in the distributedMapReduce framework. We propose the MRGS algorithm togenerate all the combinations, compute the winners at each localnode, and find out the answer globally. We further propose theMRIGS algorithm to release the bottleneck of MRGS onunbalanced computing load of nodes. Finally, we propose theMRIGS-P algorithm to prune the impossible combinations andproduce indexed and balanced MapReduce computation.Extensive experiments with NBA datasets show that MRIGS-P is6 times faster than the MRGS algorithm

    Privacy Aware Parallel Computation of Skyline Sets Queries from Distributed Databases

    Get PDF
    A skyline query finds objects that are not dominated by another object from a given set of objects. Skyline queries help us to filter unnecessary information efficiently and provide us clues for various decision making tasks. However, we cannot use skyline queries in privacy aware environment, since we have to hide individual's records values even though there is no ID information. Therefore, we considered skyline sets queries. The skyline set query returns skyline sets from all possible sets, each of which is composed of some objects in a database. With the growth of network infrastructure data are stored in distributed databases. In this paper, we expand the idea to compute skyline sets queries in parallel fashion from distributed databases without disclosing individual records to others. The proposed method utilizes an agent-based parallel computing framework that can efficiently compute skyline sets queries and can solve the privacy problems of skyline queries in distributed environment. The computation of skyline sets is performed simultaneously in all databases which increases parallelism and reduces the computation time

    RRR: Rank-Regret Representative

    Full text link
    Selecting the best items in a dataset is a common task in data exploration. However, the concept of "best" lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove "dominated" items and create a "representative" subset of the data set, comprising the "best items" in it. A Pareto-optimal representative is guaranteed to contain the best item of each possible ranking, but it can be almost as big as the full data. Representative can be found if we relax the requirement to include the best item for every possible user, and instead just limit the users' "regret". Existing work defines regret as the loss in score by limiting consideration to the representative instead of the full data set, for any chosen ranking function. However, the score is often not a meaningful number and users may not understand its absolute value. Sometimes small ranges in score can include large fractions of the data set. In contrast, users do understand the notion of rank ordering. Therefore, alternatively, we consider the position of the items in the ranked list for defining the regret and propose the {\em rank-regret representative} as the minimal subset of the data containing at least one of the top-kk of any possible ranking function. This problem is NP-complete. We use the geometric interpretation of items to bound their ranks on ranges of functions and to utilize combinatorial geometry notions for developing effective and efficient approximation algorithms for the problem. Experiments on real datasets demonstrate that we can efficiently find small subsets with small rank-regrets

    Dynamic Skyline Computation with the Skyline Breaker Algorithm

    Get PDF
    Given a sequential data input, we tackle parallel dynamic skyline computation of the read data by means of a spatial tree structure for indexing fine-grained feature vectors. For this purpose, we modified the Skyline Breaker algorithm that solves skyline computation with multiple local split decision trees concurrently. With this approach, we propose an algorithm for dynamic skyline computation that inherits the robustness against the dimension curse and different data distributions

    Efficient Algorithms for k-Regret Minimizing Sets

    Get PDF
    A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a k-regret minimizing set has the property that the regret ratio between the score of the top-1 item in Q and the score of the top-k item in P is minimized, where the score of an item is the inner product of the item\u27s attributes with a user\u27s weight (preference) vector. The problem is challenging because we want to find a single representative set Q whose regret ratio is small with respect to all possible user weight vectors. We show that k-regret minimization is NP-Complete for all dimensions d>=3, settling an open problem from Chester et al. [VLDB 2014]. Our main algorithmic contributions are two approximation algorithms, both with provable guarantees, one based on coresets and another based on hitting sets. We perform extensive experimental evaluation of our algorithms, using both real-world and synthetic data, and compare their performance against the solution proposed in [VLDB 14]. The results show that our algorithms are significantly faster and scalable to much larger sets than the greedy algorithm of Chester et al. for comparable quality answers

    Efficient All Top-k Computation - A Unified Solution for All Top-k, Reverse Top-k and Top-m Influential Queries

    Get PDF
    published_or_final_versio

    ParetoPrep: Fast computation of Path Skylines Queries

    Full text link
    Computing cost optimal paths in network data is a very important task in many application areas like transportation networks, computer networks or social graphs. In many cases, the cost of an edge can be described by various cost criteria. For example, in a road network possible cost criteria are distance, time, ascent, energy consumption or toll fees. In such a multicriteria network, a route or path skyline query computes the set of all paths having pareto optimal costs, i.e. each result path is optimal for different user preferences. In this paper, we propose a new method for computing route skylines which significantly decreases processing time and memory consumption. Furthermore, our method does not rely on any precomputation or indexing method and thus, it is suitable for dynamically changing edge costs. Our experiments demonstrate that our method outperforms state of the art approaches and allows highly efficient path skyline computation without any preprocessing.Comment: 12 pages, 9 figures, technical repor

    Spatial skyline query problem in Euclidean and road-network spaces

    Get PDF
    With the growth of data-intensive applications, along with the increase of both size and dimensionality of data, queries with advanced semantics have recently drawn researchers’ attention. Skyline query problem is one of them, which produces optimal results based on user preferences. In this thesis, we study the problem of spatial skyline query in the Euclidean and road network spaces. For a given data set P, we are required to compute the spatial skyline points of P with respect to an arbitrary query set Q. A point p ∈ P is a spatial skyline point if and only if, for any other data point r ∈ P , p is closer to at least one query point q ∈ Q as compared to r and has in the best case the same distance as r to the rest of the query points. We propose several efficient algorithms that outperform the existing algorithms
    • …
    corecore