69 research outputs found
Multi-Source Spatial Entity Linkage
Besides the traditional cartographic data sources, spatial information can
also be derived from location-based sources. However, even though different
location-based sources refer to the same physical world, each one has only
partial coverage of the spatial entities, describe them with different
attributes, and sometimes provide contradicting information. Hence, we
introduce the spatial entity linkage problem, which finds which pairs of
spatial entities belong to the same physical spatial entity. Our proposed
solution (QuadSky) starts with a time-efficient spatial blocking technique
(QuadFlex), compares pairwise the spatial entities in the same block, ranks the
pairs using Pareto optimality with the SkyRank algorithm, and finally,
classifies the pairs with our novel SkyEx-* family of algorithms that yield
0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs
and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of
777,452 pairs. Moreover, we provide a theoretical guarantee and formalize the
SkyEx-FES algorithm that explores only 27% of the skylines without any loss in
F-measure. Furthermore, our fully unsupervised algorithm SkyEx-D approximates
the optimal result with an F-measure loss of just 0.01. Finally, QuadSky
provides the best trade-off between precision and recall, and the best
F-measure compared to the existing baselines and clustering techniques, and
approximates the results of supervised learning solutions
Ranking spatial data by quality preferences
A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example, using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different functions. It can be an explicit circular region within a given distance from the flat. Another intuitive definition is to assign higher weights to the features based on their proximity to the flat. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods on both real and synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to different parameters. © 2006 IEEE.published_or_final_versio
- …