544 research outputs found
An efficient approach for processing skyline queries in incomplete multidimensional database
In recent years, there has been great attention given to skyline queries that incorporate and provide more flexible query operators that return data items (skylines) which are not being dominated by other data items in all dimensions (attributes) of the database. Many variations in skyline techniques have been proposed in the literature. However, most of these techniques determine skylines by assuming that the values of all dimensions for every data item are available (complete). But this assumption is not always true particularly for large multidimensional database as some values may be missing (not applicable during the computation). In this paper, we proposed an efficient approach for processing skyline queries in incomplete database. The experimental results show that our proposed approach has significantly reduced the number of pairwise comparisons and the processing time in determining the skylines compared to the previous approaches
Incremental Discovery of Prominent Situational Facts
We study the novel problem of finding new, prominent situational facts, which
are emerging statements about objects that stand out within certain contexts.
Many such facts are newsworthy---e.g., an athlete's outstanding performance in
a game, or a viral video's impressive popularity. Effective and efficient
identification of these facts assists journalists in reporting, one of the main
goals of computational journalism. Technically, we consider an ever-growing
table of objects with dimension and measure attributes. A situational fact is a
"contextual" skyline tuple that stands out against historical tuples in a
context, specified by a conjunctive constraint involving dimension attributes,
when a set of measure attributes are compared. New tuples are constantly added
to the table, reflecting events happening in the real world. Our goal is to
discover constraint-measure pairs that qualify a new tuple as a contextual
skyline tuple, and discover them quickly before the event becomes yesterday's
news. A brute-force approach requires exhaustive comparison with every tuple,
under every constraint, and in every measure subspace. We design algorithms in
response to these challenges using three corresponding ideas---tuple reduction,
constraint pruning, and sharing computation across measure subspaces. We also
adopt a simple prominence measure to rank the discovered facts when they are
numerous. Experiments over two real datasets validate the effectiveness and
efficiency of our techniques
Multi-Source Spatial Entity Linkage
Besides the traditional cartographic data sources, spatial information can
also be derived from location-based sources. However, even though different
location-based sources refer to the same physical world, each one has only
partial coverage of the spatial entities, describe them with different
attributes, and sometimes provide contradicting information. Hence, we
introduce the spatial entity linkage problem, which finds which pairs of
spatial entities belong to the same physical spatial entity. Our proposed
solution (QuadSky) starts with a time-efficient spatial blocking technique
(QuadFlex), compares pairwise the spatial entities in the same block, ranks the
pairs using Pareto optimality with the SkyRank algorithm, and finally,
classifies the pairs with our novel SkyEx-* family of algorithms that yield
0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs
and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of
777,452 pairs. Moreover, we provide a theoretical guarantee and formalize the
SkyEx-FES algorithm that explores only 27% of the skylines without any loss in
F-measure. Furthermore, our fully unsupervised algorithm SkyEx-D approximates
the optimal result with an F-measure loss of just 0.01. Finally, QuadSky
provides the best trade-off between precision and recall, and the best
F-measure compared to the existing baselines and clustering techniques, and
approximates the results of supervised learning solutions
K-Dominance in Multidimensional Data: Theory and Applications
We study the problem of k-dominance in a set of d-dimensional vectors, prove bounds on the number of maxima (skyline vectors), under both worst-case and average-case models, perform experimental evaluation using synthetic and real-world data, and explore an application of k-dominant skyline for extracting a small set of top-ranked vectors in high dimensions where the full skylines can be unmanageably large
A model for processing skyline queries over a database with missing data
Skyline queries provide a flexible query operator that returns data items (skylines) which are not being dominated by other data items in all dimensions (attributes) of the database. Most of the existing skyline techniques determine the skylines by assuming that the values of dimensions for every data item are available (complete). However, this assumption is not always true particularly for multidimensional database as some values may be missing. The incompleteness of data leads to the loss of the transitivity property of skyline technique and results into failure in test dominance as some data items are incomparable to each other. Furthermore, incompleteness of data influences negatively on the process of finding skylines, leading to high overhead, due to exhaustive pairwise comparisons between the data items. This paper proposed a model to process skyline queries for incomplete data with the aim of avoiding the issue of cyclic dominance in deriving skylines. The proposed model for identifying skylines for incomplete data consists of four components, namely: Data Clustering Builder, Group Constructor and Local Skylines Identifier, k-dom Skyline Generator, and Incomplete Skylines Identifier. Including these processes in the proposed model has optimized the process of identifying skylines in incomplete database by reducing the necessary number of pairwise comparison through eliminating the dominated data items as early as possible before applying the skyline technique
Treillis des concepts skylines : Analyse multidimensionnelle des skylines fond\'ee sur les ensembles en accord
The skyline concept has been introduced in order to exhibit the best objects
according to all the criterion combinations and makes it possible to analyse
the relationships between skyline objects. Like the data cube, the skycube is
so voluminous that reduction approaches are really necessary. In this paper, we
define an approach which partially materializes the skycube. The underlying
idea is to discard from the representation the skycuboids which can be computed
again the most easily. To meet this reduction objective, we characterize a
formal framework: the agree concept lattice. From this structure, we derive the
skyline concept lattice which is one of its constrained instances. The strong
points of our approach are: (i) it is attribute oriented; (ii) it provides a
boundary for the number of lattice nodes; (iii) it facilitates the navigation
within the Skycuboids
Contributions à l’Optimisation de Requêtes Multidimensionnelles
Analyser les données consiste à choisir un sous-ensemble des dimensions qui les décriventafin d'en extraire des informations utiles. Or, il est rare que l'on connaisse a priori les dimensions"intéressantes". L'analyse se transforme alors en une activité exploratoire où chaque passe traduit par une requête. Ainsi, il devient primordiale de proposer des solutions d'optimisationde requêtes qui ont une vision globale du processus plutôt que de chercher à optimiser chaque requêteindépendamment les unes des autres. Nous présentons nos contributions dans le cadre de cette approcheexploratoire en nous focalisant sur trois types de requêtes: (i) le calcul de bordures,(ii) les requêtes dites OLAP (On Line Analytical Processing) dans les cubes de données et (iii) les requêtesde préférence type skyline
Computing Subspace Skylines without Dominance Tests Using Set Interaction Approaches
Now a day’s preference answering plays major role in all crucial applications. If user wants to find top k–objects from a set of high dimensional data based on any monotonic function requires huge computation. One of the promising methods to compute preference set is Skyline Technology. Sky line computation returns the set objects that are not overruled by any other objects in n a multi dimensional space. If data is high dimensional, different users requests sky line set based on different dimensions. It requires subspace skyline computation. If objects are d-dimensional we need to compute skyline sets in 2d different subspaces, called as SKYLINE CUBE computation, which incurs lot of computation cost. In this paper we address the problem of finding subspace skyline computation with minimum effort by using simple set interaction methods. By that we can decrease the number of subspace skylines need to be searched to find full sky cube. In this paper we developed one algorithm which uses Boolean algebra rules, skyline lattice to reduce dominance test for preparing sub space skylines
- …