270 research outputs found
Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs
Many problems in areas as diverse as recommendation systems, social network
analysis, semantic search, and distributed root cause analysis can be modeled
as pattern search on labeled graphs (also called "heterogeneous information
networks" or HINs). Given a large graph and a query pattern with node and edge
label constraints, a fundamental challenge is to nd the top-k matches ac-
cording to a ranking function over edge and node weights. For users, it is di
cult to select value k . We therefore propose the novel notion of an any-k
ranking algorithm: for a given time budget, re- turn as many of the top-ranked
results as possible. Then, given additional time, produce the next lower-ranked
results quickly as well. It can be stopped anytime, but may have to continues
until all results are returned. This paper focuses on acyclic patterns over
arbitrary labeled graphs. We are interested in practical algorithms that
effectively exploit (1) properties of heterogeneous networks, in particular
selective constraints on labels, and (2) that the users often explore only a
fraction of the top-ranked results. Our solution, KARPET, carefully integrates
aggressive pruning that leverages the acyclic nature of the query, and
incremental guided search. It enables us to prove strong non-trivial time and
space guarantees, which is generally considered very hard for this type of
graph search problem. Through experimental studies we show that KARPET achieves
running times in the order of milliseconds for tree patterns on large networks
with millions of nodes and edges.Comment: To appear in WWW 201
Quantum and approximation algorithms for maximum witnesses of Boolean matrix products
The problem of finding maximum (or minimum) witnesses of the Boolean product
of two Boolean matrices (MW for short) has a number of important applications,
in particular the all-pairs lowest common ancestor (LCA) problem in directed
acyclic graphs (dags). The best known upper time-bound on the MW problem for
n\times n Boolean matrices of the form O(n^{2.575}) has not been substantially
improved since 2006. In order to obtain faster algorithms for this problem, we
study quantum algorithms for MW and approximation algorithms for MW (in the
standard computational model). Some of our quantum algorithms are input or
output sensitive. Our fastest quantum algorithm for the MW problem, and
consequently for the related problems, runs in time
\tilde{O}(n^{2+\lambda/2})=\tilde{O}(n^{2.434}), where \lambda satisfies the
equation \omega(1, \lambda, 1) = 1 + 1.5 \, \lambda and \omega(1, \lambda, 1)
is the exponent of the multiplication of an n \times n^{\lambda}$ matrix by an
n^{\lambda} \times n matrix. Next, we consider a relaxed version of the MW
problem (in the standard model) asking for reporting a witness of bounded rank
(the maximum witness has rank 1) for each non-zero entry of the matrix product.
First, by adapting the fastest known algorithm for maximum witnesses, we obtain
an algorithm for the relaxed problem that reports for each non-zero entry of
the product matrix a witness of rank at most \ell in time
\tilde{O}((n/\ell)n^{\omega(1,\log_n \ell,1)}). Then, by reducing the relaxed
problem to the so called k-witness problem, we provide an algorithm that
reports for each non-zero entry C[i,j] of the product matrix C a witness of
rank O(\lceil W_C(i,j)/k\rceil ), where W_C(i,j) is the number of witnesses for
C[i,j], with high probability. The algorithm runs in
\tilde{O}(n^{\omega}k^{0.4653} +n^2k) time, where \omega=\omega(1,1,1).Comment: 14 pages, 3 figure
Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation
The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks
Fast Monotone Summation over Disjoint Sets
We study the problem of computing an ensemble of multiple sums where the
summands in each sum are indexed by subsets of size of an -element
ground set. More precisely, the task is to compute, for each subset of size
of the ground set, the sum over the values of all subsets of size that are
disjoint from the subset of size . We present an arithmetic circuit that,
without subtraction, solves the problem using arithmetic
gates, all monotone; for constant , this is within the factor
of the optimal. The circuit design is based on viewing the summation as a "set
nucleation" task and using a tree-projection approach to implement the
nucleation. Applications include improved algorithms for counting heaviest
-paths in a weighted graph, computing permanents of rectangular matrices,
and dynamic feature selection in machine learning
Algebraic Methods in the Congested Clique
In this work, we use algebraic methods for studying distance computation and
subgraph detection tasks in the congested clique model. Specifically, we adapt
parallel matrix multiplication implementations to the congested clique,
obtaining an round matrix multiplication algorithm, where
is the exponent of matrix multiplication. In conjunction
with known techniques from centralised algorithmics, this gives significant
improvements over previous best upper bounds in the congested clique model. The
highlight results include:
-- triangle and 4-cycle counting in rounds, improving upon the
triangle detection algorithm of Dolev et al. [DISC 2012],
-- a -approximation of all-pairs shortest paths in
rounds, improving upon the -round -approximation algorithm of Nanongkai [STOC 2014], and
-- computing the girth in rounds, which is the first
non-trivial solution in this model.
In addition, we present a novel constant-round combinatorial algorithm for
detecting 4-cycles.Comment: This is work is a merger of arxiv:1412.2109 and arxiv:1412.266
- …