544 research outputs found

    An efficient approach for processing skyline queries in incomplete multidimensional database

    Get PDF
    In recent years, there has been great attention given to skyline queries that incorporate and provide more flexible query operators that return data items (skylines) which are not being dominated by other data items in all dimensions (attributes) of the database. Many variations in skyline techniques have been proposed in the literature. However, most of these techniques determine skylines by assuming that the values of all dimensions for every data item are available (complete). But this assumption is not always true particularly for large multidimensional database as some values may be missing (not applicable during the computation). In this paper, we proposed an efficient approach for processing skyline queries in incomplete database. The experimental results show that our proposed approach has significantly reduced the number of pairwise comparisons and the processing time in determining the skylines compared to the previous approaches

    Incremental Discovery of Prominent Situational Facts

    Full text link
    We study the novel problem of finding new, prominent situational facts, which are emerging statements about objects that stand out within certain contexts. Many such facts are newsworthy---e.g., an athlete's outstanding performance in a game, or a viral video's impressive popularity. Effective and efficient identification of these facts assists journalists in reporting, one of the main goals of computational journalism. Technically, we consider an ever-growing table of objects with dimension and measure attributes. A situational fact is a "contextual" skyline tuple that stands out against historical tuples in a context, specified by a conjunctive constraint involving dimension attributes, when a set of measure attributes are compared. New tuples are constantly added to the table, reflecting events happening in the real world. Our goal is to discover constraint-measure pairs that qualify a new tuple as a contextual skyline tuple, and discover them quickly before the event becomes yesterday's news. A brute-force approach requires exhaustive comparison with every tuple, under every constraint, and in every measure subspace. We design algorithms in response to these challenges using three corresponding ideas---tuple reduction, constraint pruning, and sharing computation across measure subspaces. We also adopt a simple prominence measure to rank the discovered facts when they are numerous. Experiments over two real datasets validate the effectiveness and efficiency of our techniques

    Multi-Source Spatial Entity Linkage

    Get PDF
    Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities, describe them with different attributes, and sometimes provide contradicting information. Hence, we introduce the spatial entity linkage problem, which finds which pairs of spatial entities belong to the same physical spatial entity. Our proposed solution (QuadSky) starts with a time-efficient spatial blocking technique (QuadFlex), compares pairwise the spatial entities in the same block, ranks the pairs using Pareto optimality with the SkyRank algorithm, and finally, classifies the pairs with our novel SkyEx-* family of algorithms that yield 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, we provide a theoretical guarantee and formalize the SkyEx-FES algorithm that explores only 27% of the skylines without any loss in F-measure. Furthermore, our fully unsupervised algorithm SkyEx-D approximates the optimal result with an F-measure loss of just 0.01. Finally, QuadSky provides the best trade-off between precision and recall, and the best F-measure compared to the existing baselines and clustering techniques, and approximates the results of supervised learning solutions

    K-Dominance in Multidimensional Data: Theory and Applications

    Get PDF
    We study the problem of k-dominance in a set of d-dimensional vectors, prove bounds on the number of maxima (skyline vectors), under both worst-case and average-case models, perform experimental evaluation using synthetic and real-world data, and explore an application of k-dominant skyline for extracting a small set of top-ranked vectors in high dimensions where the full skylines can be unmanageably large

    A model for processing skyline queries over a database with missing data

    Get PDF
    Skyline queries provide a flexible query operator that returns data items (skylines) which are not being dominated by other data items in all dimensions (attributes) of the database. Most of the existing skyline techniques determine the skylines by assuming that the values of dimensions for every data item are available (complete). However, this assumption is not always true particularly for multidimensional database as some values may be missing. The incompleteness of data leads to the loss of the transitivity property of skyline technique and results into failure in test dominance as some data items are incomparable to each other. Furthermore, incompleteness of data influences negatively on the process of finding skylines, leading to high overhead, due to exhaustive pairwise comparisons between the data items. This paper proposed a model to process skyline queries for incomplete data with the aim of avoiding the issue of cyclic dominance in deriving skylines. The proposed model for identifying skylines for incomplete data consists of four components, namely: Data Clustering Builder, Group Constructor and Local Skylines Identifier, k-dom Skyline Generator, and Incomplete Skylines Identifier. Including these processes in the proposed model has optimized the process of identifying skylines in incomplete database by reducing the necessary number of pairwise comparison through eliminating the dominated data items as early as possible before applying the skyline technique

    Treillis des concepts skylines : Analyse multidimensionnelle des skylines fond\'ee sur les ensembles en accord

    Full text link
    The skyline concept has been introduced in order to exhibit the best objects according to all the criterion combinations and makes it possible to analyse the relationships between skyline objects. Like the data cube, the skycube is so voluminous that reduction approaches are really necessary. In this paper, we define an approach which partially materializes the skycube. The underlying idea is to discard from the representation the skycuboids which can be computed again the most easily. To meet this reduction objective, we characterize a formal framework: the agree concept lattice. From this structure, we derive the skyline concept lattice which is one of its constrained instances. The strong points of our approach are: (i) it is attribute oriented; (ii) it provides a boundary for the number of lattice nodes; (iii) it facilitates the navigation within the Skycuboids

    Contributions à l’Optimisation de Requêtes Multidimensionnelles

    Get PDF
    Analyser les données consiste à choisir un sous-ensemble des dimensions qui les décriventafin d'en extraire des informations utiles. Or, il est rare que l'on connaisse a priori les dimensions"intéressantes". L'analyse se transforme alors en une activité exploratoire où chaque passe traduit par une requête. Ainsi, il devient primordiale de proposer des solutions d'optimisationde requêtes qui ont une vision globale du processus plutôt que de chercher à optimiser chaque requêteindépendamment les unes des autres. Nous présentons nos contributions dans le cadre de cette approcheexploratoire en nous focalisant sur trois types de requêtes: (i) le calcul de bordures,(ii) les requêtes dites OLAP (On Line Analytical Processing) dans les cubes de données et (iii) les requêtesde préférence type skyline

    Computing Subspace Skylines without Dominance Tests Using Set Interaction Approaches

    Get PDF
    Now a day’s preference answering plays major role in all crucial applications. If user wants to find top k–objects from a set of high dimensional data based on any monotonic function requires huge computation. One of the promising methods to compute preference set is Skyline Technology. Sky line computation returns the set objects that are not overruled by any other objects in n a multi dimensional space. If data is high dimensional, different users requests sky line set based on different dimensions. It requires subspace skyline computation. If objects are d-dimensional we need to compute skyline sets in 2d different subspaces, called as SKYLINE CUBE computation, which incurs lot of computation cost. In this paper we address the problem of finding subspace skyline computation with minimum effort by using simple set interaction methods. By that we can decrease the number of subspace skylines need to be searched to find full sky cube. In this paper we developed one algorithm which uses Boolean algebra rules, skyline lattice to reduce dominance test for preparing sub space skylines
    corecore