4,143 research outputs found

    A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters

    Full text link
    Keyword-based web queries with local intent retrieve web content that is relevant to supplied keywords and that represent points of interest that are near the query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, namely the top-k spatial textual clusters (k-STC) query that returns the top-k clusters that (i) are located the closest to a given query location, (ii) contain the most relevant objects with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a basic algorithm that relies on on-line density-based clustering and exploits an early stop condition. To improve the response time, we design an advanced approach that includes three techniques: (i) an object skipping rule, (ii) spatially gridded posting lists, and (iii) a fast range query algorithm. An empirical study on real data demonstrates that the paper's proposals offer scalability and are capable of excellent performance

    On trip planning queries in spatial databases

    Full text link
    In this paper we discuss a new type of query in Spatial Databases, called Trip Planning Query (TPQ). Given a set of points P in space, where each point belongs to a category, and given two points s and e, TPQ asks for the best trip that starts at s, passes through exactly one point from each category, and ends at e. An example of a TPQ is when a user wants to visit a set of different places and at the same time minimize the total travelling cost, e.g. what is the shortest travelling plan for me to visit an automobile shop, a CVS pharmacy outlet, and a Best Buy shop along my trip from A to B? The trip planning query is an extension of the well-known TSP problem and therefore is NP-hard. The difficulty of this query lies in the existence of multiple choices for each category. In this paper, we first study fast approximation algorithms for the trip planning query in a metric space, assuming that the data set fits in main memory, and give the theory analysis of their approximation bounds. Then, the trip planning query is examined for data sets that do not fit in main memory and must be stored on disk. For the disk-resident data, we consider two cases. In one case, we assume that the points are located in Euclidean space and indexed with an Rtree. In the other case, we consider the problem of points that lie on the edges of a spatial network (e.g. road network) and the distance between two points is defined using the shortest distance over the network. Finally, we give an experimental evaluation of the proposed algorithms using synthetic data sets generated on real road networks

    Shortest Path Computation with No Information Leakage

    Get PDF
    Shortest path computation is one of the most common queries in location-based services (LBSs). Although particularly useful, such queries raise serious privacy concerns. Exposing to a (potentially untrusted) LBS the client's position and her destination may reveal personal information, such as social habits, health condition, shopping preferences, lifestyle choices, etc. The only existing method for privacy-preserving shortest path computation follows the obfuscation paradigm; it prevents the LBS from inferring the source and destination of the query with a probability higher than a threshold. This implies, however, that the LBS still deduces some information (albeit not exact) about the client's location and her destination. In this paper we aim at strong privacy, where the adversary learns nothing about the shortest path query. We achieve this via established private information retrieval techniques, which we treat as black-box building blocks. Experiments on real, large-scale road networks assess the practicality of our schemes.Comment: VLDB201

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Full text link
    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.Comment: VLDB201
    • …
    corecore