1,705 research outputs found
A Practically Efficient Algorithm for Generating Answers to Keyword Search over Data Graphs
In keyword search over a data graph, an answer is a non-redundant subtree
that contains all the keywords of the query. A naive approach to producing all
the answers by increasing height is to generalize Dijkstra's algorithm to
enumerating all acyclic paths by increasing weight. The idea of freezing is
introduced so that (most) non-shortest paths are generated only if they are
actually needed for producing answers. The resulting algorithm for generating
subtrees, called GTF, is subtle and its proof of correctness is intricate.
Extensive experiments show that GTF outperforms existing systems, even ones
that for efficiency's sake are incomplete (i.e., cannot produce all the
answers). In particular, GTF is scalable and performs well even on large data
graphs and when many answers are needed.Comment: Full version of ICDT'16 pape
Keyword search in graphs, relational databases and social networks
Keyword search, a well known mechanism for retrieving relevant information from a set of documents, has recently been studied for extracting information from structured data (e.g., relational databases and XML documents). It offers an alternative way to query languages (e.g., SQL) to explore databases, which is effective for lay users who may not be familiar with the database schema or the query language. This dissertation addresses some issues in keyword search in structured data. Namely, novel solutions to existing problems in keyword search in graphs or relational databases are proposed. In addition, a problem related to graph keyword search, team formation in social networks, is studied. The dissertation consists of four parts.
The first part addresses keyword search over a graph which finds a substructure of the graph containing all or some of the query keywords. Current methods for keyword search over graphs may produce answers in which some content nodes (i.e., nodes that contain input keywords) are not very close to each other. In addition, current methods explore both content and non-content nodes while searching for the result and are thus both time and memory consuming for large graphs. To address the above problems, we propose algorithms for finding r-cliques in graphs. An r-clique is a group of content nodes that cover all the input keywords and the distance between each pair of nodes is less than or equal to r. Two approximation algorithms that produce r-cliques with a bounded approximation ratio in polynomial delay are proposed.
In the second part, the problem of duplication-free and minimal keyword search in graphs is studied. Current methods for keyword search in graphs may produce duplicate answers that contain the same set of content nodes. In addition, an answer found by these methods may not be minimal in the sense that some of the nodes in the answer may contain query keywords that are all covered by other nodes in the answer. Removing these nodes does not change the coverage of the answer but can make the answer more compact. We define the problem of finding duplication-free and minimal answers, and propose algorithms for finding such answers efficiently.
Meaningful keyword search in relational databases is the subject of the third part of this dissertation. Keyword search over relational databases returns a join tree spanning tuples containing the query keywords. As many answers of varying quality can be found, and the user is often only interested in seeing the·top-k answers, how to gauge the relevance of answers to rank them is of paramount importance. This becomes more pertinent for databases with large and complex schemas. We focus on the relevance of join trees as the fundamental means to rank the answers. We devise means to measure relevance of relations and foreign keys in the schema over the information content of the database.
The problem of keyword search over graph data is similar to the problem of team formation in social networks. In this setting, keywords represent skills and the nodes in a graph represent the experts that possess skills. Given an expert network, in which a node represents an expert that has a cost for using the expert service and an edge represents the communication cost between the two corresponding experts, we tackle the problem of finding a team of experts that covers a set of required skills and also minimizes the communication cost as well as the personnel cost of the team. We propose two types of approximation algorithms to solve this bi-criteria problem in the fourth part of this dissertation
Diversifying Top-K Results
Top-k query processing finds a list of k results that have largest scores
w.r.t the user given query, with the assumption that all the k results are
independent to each other. In practice, some of the top-k results returned can
be very similar to each other. As a result some of the top-k results returned
are redundant. In the literature, diversified top-k search has been studied to
return k results that take both score and diversity into consideration. Most
existing solutions on diversified top-k search assume that scores of all the
search results are given, and some works solve the diversity problem on a
specific problem and can hardly be extended to general cases. In this paper, we
study the diversified top-k search problem. We define a general diversified
top-k search problem that only considers the similarity of the search results
themselves. We propose a framework, such that most existing solutions for top-k
query processing can be extended easily to handle diversified top-k search, by
simply applying three new functions, a sufficient stop condition sufficient(),
a necessary stop condition necessary(), and an algorithm for diversified top-k
search on the current set of generated results, div-search-current(). We
propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve
the div-search-current() problem. div-astar is an A* based algorithm, div-dp is
an algorithm that decomposes the results into components which are searched
using div-astar independently and combined using dynamic programming. div-cut
further decomposes the current set of generated results using cut points and
combines the results using sophisticated operations. We conducted extensive
performance studies using two real datasets, enwiki and reuters. Our div-cut
algorithm finds the optimal solution for diversified top-k search problem in
seconds even for k as large as 2,000.Comment: VLDB201
Keyword-aware Optimal Route Search
Identifying a preferable route is an important problem that finds
applications in map services. When a user plans a trip within a city, the user
may want to find "a most popular route such that it passes by shopping mall,
restaurant, and pub, and the travel time to and from his hotel is within 4
hours." However, none of the algorithms in the existing work on route planning
can be used to answer such queries. Motivated by this, we define the problem of
keyword-aware optimal route query, denoted by KOR, which is to find an optimal
route such that it covers a set of user-specified keywords, a specified budget
constraint is satisfied, and an objective score of the route is optimal. The
problem of answering KOR queries is NP-hard. We devise an approximation
algorithm OSScaling with provable approximation bounds. Based on this
algorithm, another more efficient approximation algorithm BucketBound is
proposed. We also design a greedy approximation algorithm. Results of empirical
studies show that all the proposed algorithms are capable of answering KOR
queries efficiently, while the BucketBound and Greedy algorithms run faster.
The empirical studies also offer insight into the accuracy of the proposed
algorithms.Comment: VLDB201
GraphSE: An Encrypted Graph Database for Privacy-Preserving Social Search
In this paper, we propose GraphSE, an encrypted graph database for online
social network services to address massive data breaches. GraphSE preserves
the functionality of social search, a key enabler for quality social network
services, where social search queries are conducted on a large-scale social
graph and meanwhile perform set and computational operations on user-generated
contents. To enable efficient privacy-preserving social search, GraphSE
provides an encrypted structural data model to facilitate parallel and
encrypted graph data access. It is also designed to decompose complex social
search queries into atomic operations and realise them via interchangeable
protocols in a fast and scalable manner. We build GraphSE with various
queries supported in the Facebook graph search engine and implement a
full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that
GraphSE is practical for querying a social graph with a million of users.Comment: This is the full version of our AsiaCCS paper "GraphSE: An
Encrypted Graph Database for Privacy-Preserving Social Search". It includes
the security proof of the proposed scheme. If you want to cite our work,
please cite the conference version of i
Linear-Delay Enumeration for Minimal Steiner Problems
Kimelfeld and Sagiv [Kimelfeld and Sagiv, PODS 2006], [Kimelfeld and Sagiv,
Inf. Syst. 2008] pointed out the problem of enumerating -fragments is of
great importance in a keyword search on data graphs. In a graph-theoretic term,
the problem corresponds to enumerating minimal Steiner trees in (directed)
graphs. In this paper, we propose a linear-delay and polynomial-space algorithm
for enumerating all minimal Steiner trees, improving on a previous result in
[Kimelfeld and Sagiv, Inf. Syst. 2008]. Our enumeration algorithm can be
extended to other Steiner problems, such as minimal Steiner forests, minimal
terminal Steiner trees, and minimal directed Steiner trees. As another variant
of the minimal Steiner tree enumeration problem, we study the problem of
enumerating minimal induced Steiner subgraphs. We propose a polynomial-delay
and exponential-space enumeration algorithm of minimal induced Steiner
subgraphs on claw-free graphs. Contrary to these tractable results, we show
that the problem of enumerating minimal group Steiner trees is at least as hard
as the minimal transversal enumeration problem on hypergraphs
Balancing Security, Performance and Deployability in Encrypted Search
Encryption is an important tool for protecting data, especially data stored in the cloud. However, standard encryption techniques prevent efficient search. Searchable encryption attempts to solve this issue, protecting the data while still providing search functionality. Retaining the ability to search comes at a cost of security, performance and/or utility.
An important practical aspect of utility is compatibility with legacy systems. Unfortunately, the efficient searchable encryption constructions that are compatible with these systems have been proven vulnerable to attack, even against weaker adversary models.
The goal of this work is to address this security problem inherent with efficient, legacy compatible constructions. First, we present attacks on previous constructions that are compatible with legacy systems, demonstrating their vulnerability. Then we present two new searchable encryption constructions. The first, weakly randomized encryption, provides superior security to prior easily deployable constructions, while providing similar ease of deployment and query performance nearly identical to unencrypted databases. The second construction, EDDiES, provides much stronger security at the expense of a slight regression on performance.
These constructions show that it is possible to achieve a better balance of security and performance with the utility constraints that come with deployment in legacy systems
Engineering a Conformant Probabilistic Planner
We present a partial-order, conformant, probabilistic planner, Probapop which
competed in the blind track of the Probabilistic Planning Competition in IPC-4.
We explain how we adapt distance based heuristics for use with probabilistic
domains. Probapop also incorporates heuristics based on probability of success.
We explain the successes and difficulties encountered during the design and
implementation of Probapop
- …