382 research outputs found

    Multi-Source Spatial Entity Linkage

    Get PDF
    Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities, describe them with different attributes, and sometimes provide contradicting information. Hence, we introduce the spatial entity linkage problem, which finds which pairs of spatial entities belong to the same physical spatial entity. Our proposed solution (QuadSky) starts with a time-efficient spatial blocking technique (QuadFlex), compares pairwise the spatial entities in the same block, ranks the pairs using Pareto optimality with the SkyRank algorithm, and finally, classifies the pairs with our novel SkyEx-* family of algorithms that yield 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, we provide a theoretical guarantee and formalize the SkyEx-FES algorithm that explores only 27% of the skylines without any loss in F-measure. Furthermore, our fully unsupervised algorithm SkyEx-D approximates the optimal result with an F-measure loss of just 0.01. Finally, QuadSky provides the best trade-off between precision and recall, and the best F-measure compared to the existing baselines and clustering techniques, and approximates the results of supervised learning solutions

    Selection Lemmas for various geometric objects

    Full text link
    Selection lemmas are classical results in discrete geometry that have been well studied and have applications in many geometric problems like weak epsilon nets and slimming Delaunay triangulations. Selection lemma type results typically show that there exists a point that is contained in many objects that are induced (spanned) by an underlying point set. In the first selection lemma, we consider the set of all the objects induced (spanned) by a point set PP. This question has been widely explored for simplices in Rd\mathbb{R}^d, with tight bounds in R2\mathbb{R}^2. In our paper, we prove first selection lemma for other classes of geometric objects. We also consider the strong variant of this problem where we add the constraint that the piercing point comes from PP. We prove an exact result on the strong and the weak variant of the first selection lemma for axis-parallel rectangles, special subclasses of axis-parallel rectangles like quadrants and slabs, disks (for centrally symmetric point sets). We also show non-trivial bounds on the first selection lemma for axis-parallel boxes and hyperspheres in Rd\mathbb{R}^d. In the second selection lemma, we consider an arbitrary mm sized subset of the set of all objects induced by PP. We study this problem for axis-parallel rectangles and show that there exists an point in the plane that is contained in m324n4\frac{m^3}{24n^4} rectangles. This is an improvement over the previous bound by Smorodinsky and Sharir when mm is almost quadratic

    Skyline queries computation on crowdsourced- enabled incomplete database

    Get PDF
    Data incompleteness becomes a frequent phenomenon in a large number of contemporary database applications such as web autonomous databases, big data, and crowd-sourced databases. Processing skyline queries over incomplete databases impose a number of challenges that negatively influence processing the skyline queries. Most importantly, the skylines derived from incomplete databases are also incomplete in which some values are missing. Retrieving skylines with missing values is undesirable, particularly, for recommendation and decision-making systems. Furthermore, running skyline queries on a database with incomplete data raises a number of issues influence processing skyline queries such as losing the transitivity property of the skyline technique and cyclic dominance between the tuples. The issue of estimating the missing values of skylines has been discussed and examined in the database literature. Most recently, several studies have suggested exploiting the crowd-sourced databases in order to estimate the missing values by generating plausible values using the crowd. Crowd-sourced databases have proved to be a powerful solution to perform user-given tasks by integrating human intelligence and experience to process the tasks. However, task processing using crowd-sourced incurs additional monetary cost and increases the time latency. Also, it is not always possible to produce a satisfactory result that meets the user's preferences. This paper proposes an approach for estimating the missing values of the skylines by first exploiting the available data and utilizes the implicit relationships between the attributes in order to impute the missing values of the skylines. This process aims at reducing the number of values to be estimated using the crowd when local estimation is inappropriate. Intensive experiments on both synthetic and real datasets have been accomplished. The experimental results have proven that the proposed approach for estimating the missing values of the skylines over crowd-sourced enabled incomplete databases is scalable and outperforms the other existing approaches

    Probabilistic Skyline Queries over Uncertain Moving Objects

    Get PDF
    Data uncertainty inherently exists in a large number of applications due to factors such as limitations of measuring equipments, update delay, and network bandwidth. Recently, modeling and querying uncertain data have attracted considerable attention from the database community. However, how to perform advanced analysis on uncertain data remains an interesting question. In this paper, we focus on the execution of skyline computation over uncertain moving objects. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline at a certain time point, therefore a p-t-skyline contains those moving objects whose skyline probabilities are at least p at time point t. Computing probabilistic skyline over a large number of uncertain moving objects is a daunting task in practice. In order to efficiently compute the probabilistic skyline query, we propose a discrete-and-conquer strategy, which follows the sampling-bounding-pruning-refining procedure. To further reduce the skyline computation cost, we propose an enhanced framework that is based on a multi-dimensional indexing structure combined with the discrete-and-conquer strategy. Through extensive experiments with synthetic datasets, we show that the framework can efficiently support skyline queries over uncertain moving object and is scalable on large data sets

    Reporting Skyline on Uncertain Dimension with Query Interval

    Get PDF
    Naturally, users sometimes specify their preference in an imprecise way (i.e. query with an interval/range). To report results that satisfy the imprecise query as well as interesting would be easy on dataset with atomic values. The challenge is when the dataset being queried consists of both atomic values as well as continuous range of values. For a set of objects with uncertain dimension and given a query interval
    corecore