Search CORE

1,328 research outputs found

Detecting Blackholes and Volcanoes in Directed Networks

Author: Li Zhongmou
Liu Yanchi
Xiong Hui
Publication venue
Publication date: 01/01/2010
Field of study

In this paper, we formulate a novel problem for finding blackhole and volcano patterns in a large directed graph. Specifically, a blackhole pattern is a group which is made of a set of nodes in a way such that there are only inlinks to this group from the rest nodes in the graph. In contrast, a volcano pattern is a group which only has outlinks to the rest nodes in the graph. Both patterns can be observed in real world. For instance, in a trading network, a blackhole pattern may represent a group of traders who are manipulating the market. In the paper, we first prove that the blackhole mining problem is a dual problem of finding volcanoes. Therefore, we focus on finding the blackhole patterns. Along this line, we design two pruning schemes to guide the blackhole finding process. In the first pruning scheme, we strategically prune the search space based on a set of pattern-size-independent pruning rules and develop an iBlackhole algorithm. The second pruning scheme follows a divide-and-conquer strategy to further exploit the pruning results from the first pruning scheme. Indeed, a target directed graphs can be divided into several disconnected subgraphs by the first pruning scheme, and thus the blackhole finding can be conducted in each disconnected subgraph rather than in a large graph. Based on these two pruning schemes, we also develop an iBlackhole-DC algorithm. Finally, experimental results on real-world data show that the iBlackhole-DC algorithm can be several orders of magnitude faster than the iBlackhole algorithm, which has a huge computational advantage over a brute-force method.Comment: 18 page

arXiv.org e-Print Archive

CiteSeerX

Significant Subgraph Mining with Multiple Testing Correction

Author: Borgwardt Karsten M.
Kasenburg Niklas
López Felipe Llinares
Sugiyama Mahito
Publication venue
Publication date: 01/01/2015
Field of study

The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world datasets. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in itemset mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.Comment: 18 pages, 5 figure, accepted to the 2015 SIAM International Conference on Data Mining (SDM15

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Mining Representative Unsubstituted Graph Patterns Using Prior Similarity Matrix

Author: Dhifli Wajdi
Nguifo Engelbert Mephu
Saidi Rabie
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

One of the most powerful techniques to study protein structures is to look for recurrent fragments (also called substructures or spatial motifs), then use them as patterns to characterize the proteins under study. An emergent trend consists in parsing proteins three-dimensional (3D) structures into graphs of amino acids. Hence, the search of recurrent spatial motifs is formulated as a process of frequent subgraph discovery where each subgraph represents a spatial motif. In this scope, several efficient approaches for frequent subgraph discovery have been proposed in the literature. However, the set of discovered frequent subgraphs is too large to be efficiently analyzed and explored in any further process. In this paper, we propose a novel pattern selection approach that shrinks the large number of discovered frequent subgraphs by selecting the representative ones. Existing pattern selection approaches do not exploit the domain knowledge. Yet, in our approach we incorporate the evolutionary information of amino acids defined in the substitution matrices in order to select the representative subgraphs. We show the effectiveness of our approach on a number of real datasets. The results issued from our experiments show that our approach is able to considerably decrease the number of motifs while enhancing their interestingness

arXiv.org e-Print Archive

HAL Clermont Université

QuateXelero : an accelerated exact network motif detection algorithm

Author: Dichter Norbert
Khakabimamaghani Sahand
Koch Ina
Masoudi-Nejad Ali
Sharafuddin Iman
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks’ structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network

Directory of Open Access Journals

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Combinatorial algorithm for counting small induced graphs and orbits

Author: Demšar Janez
Hočevar Tomaž
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/01/2016
Field of study

Graphlet analysis is an approach to network analysis that is particularly popular in bioinformatics. We show how to set up a system of linear equations that relate the orbit counts and can be used in an algorithm that is significantly faster than the existing approaches based on direct enumeration of graphlets. The algorithm requires existence of a vertex with certain properties; we show that such vertex exists for graphlets of arbitrary size, except for complete graphs and

C_4

, which are treated separately. Empirical analysis of running time agrees with the theoretical results

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks

Author: Chunyan Wang
Jieyue He
Kunpu Qiu
Wei Zhong
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

Patterns of Interactions in Complex Social Networks Based on Coloured Motifs Analysis

Author: A. Vazquez
A.-L. Barabasi
C.H. Proctor
C.N. Alexander
E. Young-Ho
H. Chung-Yuan
K. Juszczyszyn
M.E. Shaw
N. Kashtan
R. Milo
R. Milo
S. Itzkovitz
S. Mangan
S. Mangan
S. Shen-Orr
S. Wasserman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Coloured network motifs are small subgraphs that enable to discover and interpret the patterns of interaction within the complex networks. The analysis of three-nodes motifs where the colour of the node reflects its high – white node or low – black node centrality in the social network is presented in the paper. The importance of the vertices is assessed by utilizing two measures: degree prestige and degree centrality. The distribution of motifs in these two cases is compared to mine the interconnection patterns between nodes. The analysis is performed on the social network derived from email communication

Crossref

Bournemouth University Research Online