9,164 research outputs found
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
TopCom: Index for Shortest Distance Query in Directed Graph
Finding shortest distance between two vertices in a graph is an important
problem due to its numerous applications in diverse domains, including
geo-spatial databases, social network analysis, and information retrieval.
Classical algorithms (such as, Dijkstra) solve this problem in polynomial time,
but these algorithms cannot provide real-time response for a large number of
bursty queries on a large graph. So, indexing based solutions that pre-process
the graph for efficiently answering (exactly or approximately) a large number
of distance queries in real-time is becoming increasingly popular. Existing
solutions have varying performance in terms of index size, index building time,
query time, and accuracy. In this work, we propose T OP C OM , a novel
indexing-based solution for exactly answering distance queries. Our experiments
with two of the existing state-of-the-art methods (IS-Label and TreeMap) show
the superiority of T OP C OM over these two methods considering scalability and
query time. Besides, indexing of T OP C OM exploits the DAG (directed acyclic
graph) structure in the graph, which makes it significantly faster than the
existing methods if the SCCs (strongly connected component) of the input graph
are relatively small
Making Queries Tractable on Big Data with Preprocessing
A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME al-gorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to pro-vide a formal foundation for this approach in terms of com-putational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠT0Q, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natu-ral query classes are Π-tractable and are feasible on big data. (3) We also study a set ΠTQ of query classes that can be ef-fectively converted to Π-tractable queries by re-factorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for ΠTQ. (5) We also show that ΠT0Q ⊂ P unless P = NC, i.e., the set ΠT0Q of all Π-tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, ΠTQ = P, i.e., all PTIME query classes can be made Π-tractable via proper re-factorizations. This work is a step towards understanding the tractability of queries in the context of big data. 1
Privacy-Preserving Shortest Path Computation
Navigation is one of the most popular cloud computing services. But in
virtually all cloud-based navigation systems, the client must reveal her
location and destination to the cloud service provider in order to learn the
fastest route. In this work, we present a cryptographic protocol for navigation
on city streets that provides privacy for both the client's location and the
service provider's routing data. Our key ingredient is a novel method for
compressing the next-hop routing matrices in networks such as city street maps.
Applying our compression method to the map of Los Angeles, for example, we
achieve over tenfold reduction in the representation size. In conjunction with
other cryptographic techniques, this compressed representation results in an
efficient protocol suitable for fully-private real-time navigation on city
streets. We demonstrate the practicality of our protocol by benchmarking it on
real street map data for major cities such as San Francisco and Washington,
D.C.Comment: Extended version of NDSS 2016 pape
Shortest Path Computation with No Information Leakage
Shortest path computation is one of the most common queries in location-based
services (LBSs). Although particularly useful, such queries raise serious
privacy concerns. Exposing to a (potentially untrusted) LBS the client's
position and her destination may reveal personal information, such as social
habits, health condition, shopping preferences, lifestyle choices, etc. The
only existing method for privacy-preserving shortest path computation follows
the obfuscation paradigm; it prevents the LBS from inferring the source and
destination of the query with a probability higher than a threshold. This
implies, however, that the LBS still deduces some information (albeit not
exact) about the client's location and her destination. In this paper we aim at
strong privacy, where the adversary learns nothing about the shortest path
query. We achieve this via established private information retrieval
techniques, which we treat as black-box building blocks. Experiments on real,
large-scale road networks assess the practicality of our schemes.Comment: VLDB201
Compressed materialised views of semi-structured data
Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures
- …