Search CORE

270 research outputs found

Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs

Author: Ajwani Deepak
Gatterbauer Wolfgang
Nicholson Patrick K.
Riedewald Mirek
Sala Alessandra
Yang Xiaofeng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, re- turn as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.Comment: To appear in WWW 201

arXiv.org e-Print Archive

Crossref

Quantum and approximation algorithms for maximum witnesses of Boolean matrix products

Author: A Ambainis
A Czumaj
A Shapira
D Coppersmith
F Gall
K Cohen
L Gąsieniec
M Nielsen
N Alon
V Vassilevska
X Huang
Publication venue
Publication date: 01/01/2021
Field of study

The problem of finding maximum (or minimum) witnesses of the Boolean product of two Boolean matrices (MW for short) has a number of important applications, in particular the all-pairs lowest common ancestor (LCA) problem in directed acyclic graphs (dags). The best known upper time-bound on the MW problem for n\times n Boolean matrices of the form O(n^{2.575}) has not been substantially improved since 2006. In order to obtain faster algorithms for this problem, we study quantum algorithms for MW and approximation algorithms for MW (in the standard computational model). Some of our quantum algorithms are input or output sensitive. Our fastest quantum algorithm for the MW problem, and consequently for the related problems, runs in time \tilde{O}(n^{2+\lambda/2})=\tilde{O}(n^{2.434}), where \lambda satisfies the equation \omega(1, \lambda, 1) = 1 + 1.5 \, \lambda and \omega(1, \lambda, 1) is the exponent of the multiplication of an n \times n^{\lambda}$ matrix by an n^{\lambda} \times n matrix. Next, we consider a relaxed version of the MW problem (in the standard model) asking for reporting a witness of bounded rank (the maximum witness has rank 1) for each non-zero entry of the matrix product. First, by adapting the fastest known algorithm for maximum witnesses, we obtain an algorithm for the relaxed problem that reports for each non-zero entry of the product matrix a witness of rank at most \ell in time \tilde{O}((n/\ell)n^{\omega(1,\log_n \ell,1)}). Then, by reducing the relaxed problem to the so called k-witness problem, we provide an algorithm that reports for each non-zero entry C[i,j] of the product matrix C a witness of rank O(\lceil W_C(i,j)/k\rceil ), where W_C(i,j) is the number of witnesses for C[i,j], with high probability. The algorithm runs in \tilde{O}(n^{\omega}k^{0.4653} +n^2k) time, where \omega=\omega(1,1,1).Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

Lund University Publications

Crossref

Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation

Author: A Ruepp
A Smilde
AA Tsay
AJ Butte
AL Barabasi
AL Yuille
AY Ng
B Breitkreutz
BP Kelley
C Faloutsos
CHQ Ding
Chun-Chi Liu
D Achlioptas
D Tao
DJ Thomas
E Acar
F Pan
FRK Chung
H Chen
H Hu
Haifeng Li
I Bernales
J Flannick
J Sun
J Sun
J Sun
JA Papin
JJ Hopfield
Jörg Stelling
K Kuwahara
K Takahashi
K Toeda
KA Allen
L Mao
L Omberg
LR Tucker
M Ashburner
M Kalaev
M Kanehisa
M Koyuturk
M Koyuturk
M Nicolás
M Xu
M Xu
MA Serrano
MEJ Newman
Michael S. Waterman
MR Mehan
MW Mahoney
N Genkai
O Alter
O Alter
O Alter
RB Cattell
S Arora
S Miard
T Zhang
T Zhang
TG Kolda
Tong Zhang
TS Motzkin
TW Anderson
U Luxburg
V Spirin
W Li
Wenyuan Li
X Yan
X Yan
X Zhou
Xianghong Jasmine Zhou
Y Huang
Y Yu
YP Deniélou
Publication venue: Public Library of Science
Publication date: 01/06/2011
Field of study

The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks

Crossref

Directory of Open Access Journals

PubMed Central

Fast Monotone Summation over Disjoint Sets

Author: H. Korhonen
Janne
Mikko Koivisto
Petteri Kaski
Publication venue
Publication date: 02/08/2012
Field of study

We study the problem of computing an ensemble of multiple sums where the summands in each sum are indexed by subsets of size

p

of an

n

-element ground set. More precisely, the task is to compute, for each subset of size

q

of the ground set, the sum over the values of all subsets of size

p

that are disjoint from the subset of size

q

. We present an arithmetic circuit that, without subtraction, solves the problem using

O((n^p+n^q)\log n)

arithmetic gates, all monotone; for constant

p

q

this is within the factor

\log n

of the optimal. The circuit design is based on viewing the summation as a "set nucleation" task and using a tree-projection approach to implement the nucleation. Applications include improved algorithms for counting heaviest

k

-paths in a weighted graph, computing permanents of rectangular matrices, and dynamic feature selection in machine learning

arXiv.org e-Print Archive

CiteSeerX

Algebraic Methods in the Congested Clique

Author: Aho Alfred V.
Björklund Andreas
Czumaj Artur
Furman M. E.
Holzer Stephan
James
Nesetril Jaroslav
Tiskin Alexandre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In this work, we use algebraic methods for studying distance computation and subgraph detection tasks in the congested clique model. Specifically, we adapt parallel matrix multiplication implementations to the congested clique, obtaining an

O(n^{1-2/\omega})

round matrix multiplication algorithm, where

\omega < 2.3728639

is the exponent of matrix multiplication. In conjunction with known techniques from centralised algorithmics, this gives significant improvements over previous best upper bounds in the congested clique model. The highlight results include: -- triangle and 4-cycle counting in

O(n^{0.158})

rounds, improving upon the

O(n^{1/3})

triangle detection algorithm of Dolev et al. [DISC 2012], -- a

(1 + o(1))

-approximation of all-pairs shortest paths in

O(n^{0.158})

rounds, improving upon the

\tilde{O} (n^{1/2})

-round

(2 + o(1))

-approximation algorithm of Nanongkai [STOC 2014], and -- computing the girth in

O(n^{0.158})

rounds, which is the first non-trivial solution in this model. In addition, we present a novel constant-round combinatorial algorithm for detecting 4-cycles.Comment: This is work is a merger of arxiv:1412.2109 and arxiv:1412.266

arXiv.org e-Print Archive

Crossref

MPG.PuRe