Search CORE

641 research outputs found

Detecting Blackholes and Volcanoes in Directed Networks

Author: Li Zhongmou
Liu Yanchi
Xiong Hui
Publication venue
Publication date: 01/01/2010
Field of study

In this paper, we formulate a novel problem for finding blackhole and volcano patterns in a large directed graph. Specifically, a blackhole pattern is a group which is made of a set of nodes in a way such that there are only inlinks to this group from the rest nodes in the graph. In contrast, a volcano pattern is a group which only has outlinks to the rest nodes in the graph. Both patterns can be observed in real world. For instance, in a trading network, a blackhole pattern may represent a group of traders who are manipulating the market. In the paper, we first prove that the blackhole mining problem is a dual problem of finding volcanoes. Therefore, we focus on finding the blackhole patterns. Along this line, we design two pruning schemes to guide the blackhole finding process. In the first pruning scheme, we strategically prune the search space based on a set of pattern-size-independent pruning rules and develop an iBlackhole algorithm. The second pruning scheme follows a divide-and-conquer strategy to further exploit the pruning results from the first pruning scheme. Indeed, a target directed graphs can be divided into several disconnected subgraphs by the first pruning scheme, and thus the blackhole finding can be conducted in each disconnected subgraph rather than in a large graph. Based on these two pruning schemes, we also develop an iBlackhole-DC algorithm. Finally, experimental results on real-world data show that the iBlackhole-DC algorithm can be several orders of magnitude faster than the iBlackhole algorithm, which has a huge computational advantage over a brute-force method.Comment: 18 page

arXiv.org e-Print Archive

CiteSeerX

High Performance Frequent Subgraph Mining on Transactional Datasets

Author: Jena Bismita
Publication venue: ScholarWorks @ Georgia State University
Publication date: 06/05/2019
Field of study

Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, and Web etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years several research groups have attempted to handle the frequent subgraph mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google\u27s MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and non-iterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this work, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results

ScholarWorks @ Georgia State University

An introduction to Graph Data Management

Author: A Dries
A Gutiérrez
A Iosup
A Morari
A Poulovassilis
AD Zhu
AO Mendelzon
B Amann
B Elser
C Berge
C Vicknair
C Watters
C Weiss
CS Chang
D Conte
D Dominguez-Sal
D Theodoratos
DC Faye
DW Shipman
EF Codd
FW Tompa
G Malewicz
GM Kuper
H He
HS Kunii
IF Cruz
IF Cruz
J Hidders
J Paredaens
J Peckham
J. Hidders
Jonathan Hayes
K Zeng
L Kowalik
L Zou
M Atre
M Ciglan
M Consens
M Gemis
M Gyssens
M Han
M Levene
M Levene
M Levene
M Mainguenaud
M Schmidt
M Yannakakis
MA Bornea
MA Rodriguez
MA Rodriguez
Marc Andries
MP Consens
MP Consens
N Kiesel
N Roussopoulos
O Erling
P Barceló Baeza
P Buneman
P Yuan
Philippe Cudré-Mauroux
PPS Chen
PT Wood
PT Wood
R Agrawal
R Angles
R Angles
R Brijder
R Ronen
RH Güting
RS Xin
S Abiteboul
S Abiteboul
T Neumann
W Fan
W Kim
Y Guo
Y Low
Y Papakonstantinou
Y Tian
Y Zhao
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2017
Field of study

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them

arXiv.org e-Print Archive

Crossref

Distributed Estimation of Graph 4-Profiles

Author: Borokhovich Michael
Dimakis Alexandros G.
Elenberg Ethan R.
Shanmugam Karthikeyan
Publication venue
Publication date: 04/04/2016
Field of study

We present a novel distributed algorithm for counting all four-node induced subgraphs in a big graph. These counts, called the

4

-profile, describe a graph's connectivity properties and have found several uses ranging from bioinformatics to spam detection. We also study the more complicated problem of estimating the local

4

-profiles centered at each vertex of the graph. The local

4

-profile embeds every vertex in an

11

-dimensional space that characterizes the local geometry of its neighborhood: vertices that connect different clusters will have different local

4

-profiles compared to those that are only part of one dense cluster. Our algorithm is a local, distributed message-passing scheme on the graph and computes all the local

4

-profiles in parallel. We rely on two novel theoretical contributions: we show that local

4

-profiles can be calculated using compressed two-hop information and also establish novel concentration results that show that graphs can be substantially sparsified and still retain good approximation quality for the global

4

-profile. We empirically evaluate our algorithm using a distributed GraphLab implementation that we scaled up to

640

cores. We show that our algorithm can compute global and local

4

-profiles of graphs with millions of edges in a few minutes, significantly improving upon the previous state of the art.Comment: To appear in part at WWW'1

arXiv.org e-Print Archive

Crossref

Mining functional subgraphs from cancer protein-protein interaction networks

Author: Goonesekere Nalin
Guda Chittibabu
Shen Ru
Publication venue: UNI ScholarWorks
Publication date: 01/01/2012
Field of study

Background: Protein-protein interaction (PPI) networks carry vital information about proteins’ functions. Analysis of PPI networks associated with specific disease systems including cancer helps us in the understanding of the complex biology of diseases. Specifically, identification of similar and frequently occurring patterns (network motifs) across PPI networks will provide useful clues to better understand the biology of the diseases. Results: In this study, we developed a novel pattern-mining algorithm that detects cancer associated functional subgraphs occurring in multiple cancer PPI networks. We constructed nine cancer PPI networks using differentially expressed genes from the Oncomine dataset. From these networks we discovered frequent patterns that occur in all networks and at different size levels. Patterns are abstracted subgraphs with their nodes replaced by node cluster IDs. By using effective canonical labeling and adopting weighted adjacency matrices, we are able to perform graph isomorphism test in polynomial running time. We use a bottom-up pattern growth approach to search for patterns, which allows us to effectively reduce the search space as pattern sizes grow. Validation of the frequent common patterns using GO semantic similarity showed that the discovered subgraphs scored consistently higher than the randomly generated subgraphs at each size level. We further investigated the cancer relevance of a select set of subgraphs using literature-based evidences. Conclusion: Frequent common patterns exist in cancer PPI networks, which can be found through effective pattern mining algorithms. We believe that this work would allow us to identify functionally relevant and coherent subgraphs in cancer networks, which can be advanced to experimental validation to further our understanding of the complex biology of cancer

Springer - Publisher Connector

PubMed Central

University of Northern Iowa

A Parameterized Centrality Metric for Network Analysis

Author: F. Gebali
G. Simmel
G. Strang
H. Tong
H. Tong
Kristina Lerman
R. Ghosh
R. Ghosh
R. S. Burt
Rumi Ghosh
S. Wasserman
W. L. Ferrar
W. W. Zachary
Publication venue: 'American Physical Society (APS)'
Publication date: 20/10/2010
Field of study

A variety of metrics have been proposed to measure the relative importance of nodes in a network. One of these, alpha-centrality [Bonacich, 2001], measures the number of attenuated paths that exist between nodes. We introduce a normalized version of this metric and use it to study network structure, specifically, to rank nodes and find community structure of the network. Specifically, we extend the modularity-maximization method [Newman and Girvan, 2004] for community detection to use this metric as the measure of node connectivity. Normalized alpha-centrality is a powerful tool for network analysis, since it contains a tunable parameter that sets the length scale of interactions. By studying how rankings and discovered communities change when this parameter is varied allows us to identify locally and globally important nodes and structures. We apply the proposed method to several benchmark networks and show that it leads to better insight into network structure than alternative methods.Comment: 11 pages, submitted to Physical Review

arXiv.org e-Print Archive

Crossref

Discovering Informative Connection Subgraphs in Multi-Relational Graphs

Author: Milnor William
Perry Matthew
Ramakrishnan Cartic
Sheth Amit P.
Publication venue: CORE Scholar
Publication date: 01/12/2005
Field of study

Discovering patterns in graphs has long been an area of interest. In most approaches to such pattern discovery either quantitative anomalies, frequency of substructure or maximum flow is used to measure the interestingness of a pattern. In this paper we introduce heuristics that guide a subgraph discovery algorithm away from banal paths towards more informative ones. Given an RDF graph a user might pose a question of the form: What are the most relevant ways in which entity X is related to entity Y? the response to which is a subgraph connecting X to Y. We use our heuristics to discover informative subgraphs within RDF graphs. Our heuristics are based on weighting mechanisms derived from edge semantics suggested by the RDF schema. We present an analysis of the quality of the subgraphs generated with respect to path ranking metrics. We then conclude presenting intuitions about which of our weighting schemes and heuristics produce higher quality subgraphs

CORE