3,170 research outputs found
Efficient Subgraph Matching on Billion Node Graphs
The ability to handle large scale graph data is crucial to an increasing
number of applications. Much work has been dedicated to supporting basic graph
operations such as subgraph matching, reachability, regular expression
matching, etc. In many cases, graph indices are employed to speed up query
processing. Typically, most indices require either super-linear indexing time
or super-linear indexing space. Unfortunately, for very large graphs,
super-linear approaches are almost always infeasible. In this paper, we study
the problem of subgraph matching on billion-node graphs. We present a novel
algorithm that supports efficient subgraph matching for graphs deployed on a
distributed memory store. Instead of relying on super-linear indices, we use
efficient graph exploration and massive parallel computing for query
processing. Our experimental results demonstrate the feasibility of performing
subgraph matching on web-scale graph data.Comment: VLDB201
Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity
Consider two parties who want to compare their strings, e.g., genomes, but do
not want to reveal them to each other. We present a system for
privacy-preserving matching of strings, which differs from existing systems by
providing a deterministic approximation instead of an exact distance. It is
efficient (linear complexity), non-interactive and does not involve a third
party which makes it particularly suitable for cloud computing. We extend our
protocol, such that it mitigates iterated differential attacks proposed by
Goodrich. Further an implementation of the system is evaluated and compared
against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure
Approximate sequence alignment
Given a collection of strings and a query string, the goal of the approximate string matching is to efficiently find the strings in the collection, which are similar to the query string. In this paper, we focus on edit distance as a measure to quantify the similarity between two strings. Existing q-gram based methods use inverted lists to index the q-grams of the given string collection. These methods begin with generating the q-grams of the query string, disjoint or overlapping, and then merge the inverted lists of these q-grams. Several filtering techniques have been proposed to segment inverted lists in order to obtain relatively shorter lists, thus reducing the merging cost. The filtering technique we propose in this thesis, which is called position restricted alignment, combines well known length filtering and position filtering to provide more aggressive pruning. We then provide an indexing scheme that integrates the inverted lists storage with the proposed filter. It enables us to auto-filter the inverted lists. We evaluate the effectiveness of the proposed approach by experiments
- …