30 research outputs found
Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation
The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks
Sketch-Based Streaming Anomaly Detection in Dynamic Graphs
Given a stream of graph edges from a dynamic graph, how can we assign anomaly
scores to edges and subgraphs in an online manner, for the purpose of detecting
unusual behavior, using constant time and memory? For example, in intrusion
detection, existing work seeks to detect either anomalous edges or anomalous
subgraphs, but not both. In this paper, we first extend the count-min sketch
data structure to a higher-order sketch. This higher-order sketch has the
useful property of preserving the dense subgraph structure (dense subgraphs in
the input turn into dense submatrices in the data structure). We then propose
four online algorithms that utilize this enhanced data structure, which (a)
detect both edge and graph anomalies; (b) process each edge and graph in
constant memory and constant update time per newly arriving edge, and; (c)
outperform state-of-the-art baselines on four real-world datasets. Our method
is the first streaming approach that incorporates dense subgraph search to
detect graph anomalies in constant memory and time
Parallel Randomized Tucker Decomposition Algorithms
The Tucker tensor decomposition is a natural extension of the singular value
decomposition (SVD) to multiway data. We propose to accelerate Tucker tensor
decomposition algorithms by using randomization and parallelization. We present
two algorithms that scale to large data and many processors, significantly
reduce both computation and communication cost compared to previous
deterministic and randomized approaches, and obtain nearly the same
approximation errors. The key idea in our algorithms is to perform randomized
sketches with Kronecker-structured random matrices, which reduces computation
compared to unstructured matrices and can be implemented using a fundamental
tensor computational kernel. We provide probabilistic error analysis of our
algorithms and implement a new parallel algorithm for the structured randomized
sketch. Our experimental results demonstrate that our combination of
randomization and parallelization achieves accurate Tucker decompositions much
faster than alternative approaches. We observe up to a 16X speedup over the
fastest deterministic parallel implementation on 3D simulation data