200 research outputs found

    Distributed Indexing Schemes for k-Dominant Skyline Analytics on Uncertain Edge-IoT Data

    Full text link
    Skyline queries typically search a Pareto-optimal set from a given data set to solve the corresponding multiobjective optimization problem. As the number of criteria increases, the skyline presumes excessive data items, which yield a meaningless result. To address this curse of dimensionality, we proposed a k-dominant skyline in which the number of skyline members was reduced by relaxing the restriction on the number of dimensions, considering the uncertainty of data. Specifically, each data item was associated with a probability of appearance, which represented the probability of becoming a member of the k-dominant skyline. As data items appear continuously in data streams, the corresponding k-dominant skyline may vary with time. Therefore, an effective and rapid mechanism of updating the k-dominant skyline becomes crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI) and All Indexing (AI), for k-dominant skyline in distributed edge-computing environments, where irrelevant data items can be effectively excluded from the compute to reduce the processing duration. Furthermore, the proposed schemes were validated with extensive experimental simulations. The experimental results demonstrated that the proposed MI and AI schemes reduced the computation time by approximately 13% and 56%, respectively, compared with the existing method.Comment: 13 pages, 8 figures, 12 tables, to appear in IEEE Transactions on Emerging Topics in Computin

    Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

    Full text link
    Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

    Integrating OLAP and Ranking: The Ranking-Cube Methodology

    Get PDF
    Recent years have witnessed an enormous growth of data in business, industry, and Web applications. Database search often returns a large collection of results, which poses challenges to both efficient query processing and effective digest of the query results. To address this problem, ranked search has been introduced to database systems. We study the problem of On-Line Analytical Processing (OLAP) of ranked queries, where ranked queries are conducted in the arbitrary subset of data defined by multi-dimensional selections. While pre-computation and multi-dimensional aggregation is the standard solution for OLAP, materializing dynamic ranking results is unrealistic because the ranking criteria are not known until the query time. To overcome such difficulty, we develop a new ranking cube method that performs semi on-line materialization and semi online computation in this thesis. Its complete life cycle, including cube construction, incremental maintenance, and query processing, is also discussed. We further extend the ranking cube in three dimensions. First, how to answer queries in high-dimensional data. Second, how to answer queries which involves joins over multiple relations. Third, how to answer general preference queries (besides ranked queries, such as skyline queries). Our performance studies show that ranking-cube is orders of magnitude faster than previous approaches

    SkyLens: Visual analysis of skyline on multi-dimensional data

    Get PDF
    Skyline queries have wide-ranging applications in fields that involve multi-criteria decision making, including tourism, retail industry, and human resources. By automatically removing incompetent candidates, skyline queries allow users to focus on a subset of superior data items (i.e., the skyline), thus reducing the decision-making overhead. However, users are still required to interpret and compare these superior items manually before making a successful choice. This task is challenging because of two issues. First, people usually have fuzzy, unstable, and inconsistent preferences when presented with multiple candidates. Second, skyline queries do not reveal the reasons for the superiority of certain skyline points in a multi-dimensional space. To address these issues, we propose SkyLens, a visual analytic system aiming at revealing the superiority of skyline points from different perspectives and at different scales to aid users in their decision making. Two scenarios demonstrate the usefulness of SkyLens on two datasets with a dozen of attributes. A qualitative study is also conducted to show that users can efficiently accomplish skyline understanding and comparison tasks with SkyLens.Comment: 10 pages. Accepted for publication at IEEE VIS 2017 (in proceedings of VAST

    Probabilistic Skyline Queries over Uncertain Moving Objects

    Get PDF
    Data uncertainty inherently exists in a large number of applications due to factors such as limitations of measuring equipments, update delay, and network bandwidth. Recently, modeling and querying uncertain data have attracted considerable attention from the database community. However, how to perform advanced analysis on uncertain data remains an interesting question. In this paper, we focus on the execution of skyline computation over uncertain moving objects. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline at a certain time point, therefore a p-t-skyline contains those moving objects whose skyline probabilities are at least p at time point t. Computing probabilistic skyline over a large number of uncertain moving objects is a daunting task in practice. In order to efficiently compute the probabilistic skyline query, we propose a discrete-and-conquer strategy, which follows the sampling-bounding-pruning-refining procedure. To further reduce the skyline computation cost, we propose an enhanced framework that is based on a multi-dimensional indexing structure combined with the discrete-and-conquer strategy. Through extensive experiments with synthetic datasets, we show that the framework can efficiently support skyline queries over uncertain moving object and is scalable on large data sets

    スカイライン問合わせを利用した大規模データベースの情報選別

    Get PDF
    Conventional SQL queries take exact input and produce complete result set. However, with massive increase in data volume in different applications, the large result sets returned by traditional SQL queries are not well suited for the users to take effective decisions. Therefore, there is an increasing interest in queries like top-k queries and skyline queries those produce a more concise result set. Top-k queries rely on the scores of the objects to evaluate the usefulness of the objects. In this type of queries, users require to define their own scoring function by combining their interests. Based on the user defined scoring function, the system sorts the objects by their scores and outputs the top-k objects in the ranking list as the result. However, defining a scoring function by the users is a major draw of the top-k queries as in the large data sets where there are many conflicting criteria exist, it is very difficult for the users to define the scoring functions by themselves.……広島大学(Hiroshima University)博士(工学)Engineeringdoctora

    Continuous Top-k Dominating Queries in Subspaces

    Full text link
    corecore