270 research outputs found

    Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs

    Full text link
    Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, re- turn as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.Comment: To appear in WWW 201

    Quantum and approximation algorithms for maximum witnesses of Boolean matrix products

    Full text link
    The problem of finding maximum (or minimum) witnesses of the Boolean product of two Boolean matrices (MW for short) has a number of important applications, in particular the all-pairs lowest common ancestor (LCA) problem in directed acyclic graphs (dags). The best known upper time-bound on the MW problem for n\times n Boolean matrices of the form O(n^{2.575}) has not been substantially improved since 2006. In order to obtain faster algorithms for this problem, we study quantum algorithms for MW and approximation algorithms for MW (in the standard computational model). Some of our quantum algorithms are input or output sensitive. Our fastest quantum algorithm for the MW problem, and consequently for the related problems, runs in time \tilde{O}(n^{2+\lambda/2})=\tilde{O}(n^{2.434}), where \lambda satisfies the equation \omega(1, \lambda, 1) = 1 + 1.5 \, \lambda and \omega(1, \lambda, 1) is the exponent of the multiplication of an n \times n^{\lambda}$ matrix by an n^{\lambda} \times n matrix. Next, we consider a relaxed version of the MW problem (in the standard model) asking for reporting a witness of bounded rank (the maximum witness has rank 1) for each non-zero entry of the matrix product. First, by adapting the fastest known algorithm for maximum witnesses, we obtain an algorithm for the relaxed problem that reports for each non-zero entry of the product matrix a witness of rank at most \ell in time \tilde{O}((n/\ell)n^{\omega(1,\log_n \ell,1)}). Then, by reducing the relaxed problem to the so called k-witness problem, we provide an algorithm that reports for each non-zero entry C[i,j] of the product matrix C a witness of rank O(\lceil W_C(i,j)/k\rceil ), where W_C(i,j) is the number of witnesses for C[i,j], with high probability. The algorithm runs in \tilde{O}(n^{\omega}k^{0.4653} +n^2k) time, where \omega=\omega(1,1,1).Comment: 14 pages, 3 figure

    Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation

    Get PDF
    The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks

    Fast Monotone Summation over Disjoint Sets

    Full text link
    We study the problem of computing an ensemble of multiple sums where the summands in each sum are indexed by subsets of size pp of an nn-element ground set. More precisely, the task is to compute, for each subset of size qq of the ground set, the sum over the values of all subsets of size pp that are disjoint from the subset of size qq. We present an arithmetic circuit that, without subtraction, solves the problem using O((np+nq)logn)O((n^p+n^q)\log n) arithmetic gates, all monotone; for constant pp, qq this is within the factor logn\log n of the optimal. The circuit design is based on viewing the summation as a "set nucleation" task and using a tree-projection approach to implement the nucleation. Applications include improved algorithms for counting heaviest kk-paths in a weighted graph, computing permanents of rectangular matrices, and dynamic feature selection in machine learning

    Algebraic Methods in the Congested Clique

    Full text link
    In this work, we use algebraic methods for studying distance computation and subgraph detection tasks in the congested clique model. Specifically, we adapt parallel matrix multiplication implementations to the congested clique, obtaining an O(n12/ω)O(n^{1-2/\omega}) round matrix multiplication algorithm, where ω<2.3728639\omega < 2.3728639 is the exponent of matrix multiplication. In conjunction with known techniques from centralised algorithmics, this gives significant improvements over previous best upper bounds in the congested clique model. The highlight results include: -- triangle and 4-cycle counting in O(n0.158)O(n^{0.158}) rounds, improving upon the O(n1/3)O(n^{1/3}) triangle detection algorithm of Dolev et al. [DISC 2012], -- a (1+o(1))(1 + o(1))-approximation of all-pairs shortest paths in O(n0.158)O(n^{0.158}) rounds, improving upon the O~(n1/2)\tilde{O} (n^{1/2})-round (2+o(1))(2 + o(1))-approximation algorithm of Nanongkai [STOC 2014], and -- computing the girth in O(n0.158)O(n^{0.158}) rounds, which is the first non-trivial solution in this model. In addition, we present a novel constant-round combinatorial algorithm for detecting 4-cycles.Comment: This is work is a merger of arxiv:1412.2109 and arxiv:1412.266
    corecore