447 research outputs found
Enumerating Subgraph Instances Using Map-Reduce
The theme of this paper is how to find all instances of a given "sample"
graph in a larger "data graph," using a single round of map-reduce. For the
simplest sample graph, the triangle, we improve upon the best known such
algorithm. We then examine the general case, considering both the communication
cost between mappers and reducers and the total computation cost at the
reducers. To minimize communication cost, we exploit the techniques of (Afrati
and Ullman, TKDE 2011)for computing multiway joins (evaluating conjunctive
queries) in a single map-reduce round. Several methods are shown for
translating sample graphs into a union of conjunctive queries with as few
queries as possible. We also address the matter of optimizing computation cost.
Many serial algorithms are shown to be "convertible," in the sense that it is
possible to partition the data graph, explore each partition in a separate
reducer, and have the total computation cost at the reducers be of the same
order as the computation cost of the serial algorithm.Comment: 37 page
A New Framework for Join Product Skew
Different types of data skew can result in load imbalance in the context of
parallel joins under the shared nothing architecture. We study one important
type of skew, join product skew (JPS). A static approach based on frequency
classes is proposed which takes for granted the data distribution of join
attribute values. It comes from the observation that the join selectivity can
be expressed as a sum of products of frequencies of the join attribute values.
As a consequence, an appropriate assignment of join sub-tasks, that takes into
consideration the magnitude of the frequency products can alleviate the join
product skew. Motivated by the aforementioned remark, we propose an algorithm,
called Handling Join Product Skew (HJPS), to handle join product skew
On relating CTL to Datalog
CTL is the dominant temporal specification language in practice mainly due to
the fact that it admits model checking in linear time. Logic programming and
the database query language Datalog are often used as an implementation
platform for logic languages. In this paper we present the exact relation
between CTL and Datalog and moreover we build on this relation and known
efficient algorithms for CTL to obtain efficient algorithms for fragments of
stratified Datalog. The contributions of this paper are: a) We embed CTL into
STD which is a proper fragment of stratified Datalog. Moreover we show that STD
expresses exactly CTL -- we prove that by embedding STD into CTL. Both
embeddings are linear. b) CTL can also be embedded to fragments of Datalog
without negation. We define a fragment of Datalog with the successor build-in
predicate that we call TDS and we embed CTL into TDS in linear time. We build
on the above relations to answer open problems of stratified Datalog. We prove
that query evaluation is linear and that containment and satisfiability
problems are both decidable. The results presented in this paper are the first
for fragments of stratified Datalog that are more general than those containing
only unary EDBs.Comment: 34 pages, 1 figure (file .eps
- …