1,328 research outputs found
Detecting Blackholes and Volcanoes in Directed Networks
In this paper, we formulate a novel problem for finding blackhole and volcano
patterns in a large directed graph. Specifically, a blackhole pattern is a
group which is made of a set of nodes in a way such that there are only inlinks
to this group from the rest nodes in the graph. In contrast, a volcano pattern
is a group which only has outlinks to the rest nodes in the graph. Both
patterns can be observed in real world. For instance, in a trading network, a
blackhole pattern may represent a group of traders who are manipulating the
market. In the paper, we first prove that the blackhole mining problem is a
dual problem of finding volcanoes. Therefore, we focus on finding the blackhole
patterns. Along this line, we design two pruning schemes to guide the blackhole
finding process. In the first pruning scheme, we strategically prune the search
space based on a set of pattern-size-independent pruning rules and develop an
iBlackhole algorithm. The second pruning scheme follows a divide-and-conquer
strategy to further exploit the pruning results from the first pruning scheme.
Indeed, a target directed graphs can be divided into several disconnected
subgraphs by the first pruning scheme, and thus the blackhole finding can be
conducted in each disconnected subgraph rather than in a large graph. Based on
these two pruning schemes, we also develop an iBlackhole-DC algorithm. Finally,
experimental results on real-world data show that the iBlackhole-DC algorithm
can be several orders of magnitude faster than the iBlackhole algorithm, which
has a huge computational advantage over a brute-force method.Comment: 18 page
Significant Subgraph Mining with Multiple Testing Correction
The problem of finding itemsets that are statistically significantly enriched
in a class of transactions is complicated by the need to correct for multiple
hypothesis testing. Pruning untestable hypotheses was recently proposed as a
strategy for this task of significant itemset mining. It was shown to lead to
greater statistical power, the discovery of more truly significant itemsets,
than the standard Bonferroni correction on real-world datasets. An open
question, however, is whether this strategy of excluding untestable hypotheses
also leads to greater statistical power in subgraph mining, in which the number
of hypotheses is much larger than in itemset mining. Here we answer this
question by an empirical investigation on eight popular graph benchmark
datasets. We propose a new efficient search strategy, which always returns the
same solution as the state-of-the-art approach and is approximately two orders
of magnitude faster. Moreover, we exploit the dependence between subgraphs by
considering the effective number of tests and thereby further increase the
statistical power.Comment: 18 pages, 5 figure, accepted to the 2015 SIAM International
Conference on Data Mining (SDM15
Mining Representative Unsubstituted Graph Patterns Using Prior Similarity Matrix
One of the most powerful techniques to study protein structures is to look
for recurrent fragments (also called substructures or spatial motifs), then use
them as patterns to characterize the proteins under study. An emergent trend
consists in parsing proteins three-dimensional (3D) structures into graphs of
amino acids. Hence, the search of recurrent spatial motifs is formulated as a
process of frequent subgraph discovery where each subgraph represents a spatial
motif. In this scope, several efficient approaches for frequent subgraph
discovery have been proposed in the literature. However, the set of discovered
frequent subgraphs is too large to be efficiently analyzed and explored in any
further process. In this paper, we propose a novel pattern selection approach
that shrinks the large number of discovered frequent subgraphs by selecting the
representative ones. Existing pattern selection approaches do not exploit the
domain knowledge. Yet, in our approach we incorporate the evolutionary
information of amino acids defined in the substitution matrices in order to
select the representative subgraphs. We show the effectiveness of our approach
on a number of real datasets. The results issued from our experiments show that
our approach is able to considerably decrease the number of motifs while
enhancing their interestingness
QuateXelero : an accelerated exact network motif detection algorithm
Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks’ structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network
Combinatorial algorithm for counting small induced graphs and orbits
Graphlet analysis is an approach to network analysis that is particularly
popular in bioinformatics. We show how to set up a system of linear equations
that relate the orbit counts and can be used in an algorithm that is
significantly faster than the existing approaches based on direct enumeration
of graphlets. The algorithm requires existence of a vertex with certain
properties; we show that such vertex exists for graphlets of arbitrary size,
except for complete graphs and , which are treated separately. Empirical
analysis of running time agrees with the theoretical results
Patterns of Interactions in Complex Social Networks Based on Coloured Motifs Analysis
Coloured network motifs are small subgraphs that enable to discover and interpret the patterns of interaction within the complex networks. The analysis of three-nodes motifs where the colour of the node reflects its high – white node or low – black node centrality in the social network is presented in the paper. The importance of the vertices is assessed by utilizing two measures: degree prestige and degree centrality. The distribution of motifs in these two cases is compared to mine the interconnection patterns between nodes. The analysis is performed on the social network derived from email communication
- …