11,256 research outputs found
Motif counting beyond five nodes
Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets
Automatic Network Fingerprinting through Single-Node Motifs
Complex networks have been characterised by their specific connectivity
patterns (network motifs), but their building blocks can also be identified and
described by node-motifs---a combination of local network features. One
technique to identify single node-motifs has been presented by Costa et al. (L.
D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett.,
87, 1, 2009). Here, we first suggest improvements to the method including how
its parameters can be determined automatically. Such automatic routines make
high-throughput studies of many networks feasible. Second, the new routines are
validated in different network-series. Third, we provide an example of how the
method can be used to analyse network time-series. In conclusion, we provide a
robust method for systematically discovering and classifying characteristic
nodes of a network. In contrast to classical motif analysis, our approach can
identify individual components (here: nodes) that are specific to a network.
Such special nodes, as hubs before, might be found to play critical roles in
real-world networks.Comment: 16 pages (4 figures) plus supporting information 8 pages (5 figures
Hypergraph Motifs and Their Extensions Beyond Binary
Hypergraphs naturally represent group interactions, which are omnipresent in
many domains: collaborations of researchers, co-purchases of items, and joint
interactions of proteins, to name a few. In this work, we propose tools for
answering the following questions: (Q1) what are the structural design
principles of real-world hypergraphs? (Q2) how can we compare local structures
of hypergraphs of different sizes? (Q3) how can we identify domains from which
hypergraphs are? We first define hypergraph motifs (h-motifs), which describe
the overlapping patterns of three connected hyperedges. Then, we define the
significance of each h-motif in a hypergraph as its occurrences relative to
those in properly randomized hypergraphs. Lastly, we define the characteristic
profile (CP) as the vector of the normalized significance of every h-motif.
Regarding Q1, we find that h-motifs' occurrences in 11 real-world hypergraphs
from 5 domains are clearly distinguished from those of randomized hypergraphs.
Then, we demonstrate that CPs capture local structural patterns unique to each
domain, and thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept
of CP is extended to represent the connectivity pattern of each node or
hyperedge as a vector, which proves useful in node classification and hyperedge
prediction. Our algorithmic contribution is to propose MoCHy, a family of
parallel algorithms for counting h-motifs' occurrences in a hypergraph. We
theoretically analyze their speed and accuracy and show empirically that the
advanced approximate version MoCHy-A+ is more accurate and faster than the
basic approximate and exact versions, respectively. Furthermore, we explore
ternary hypergraph motifs that extends h-motifs by taking into account not only
the presence but also the cardinality of intersections among hyperedges. This
extension proves beneficial for all previously mentioned applications.Comment: Extended version of VLDB 2020 paper arXiv:2003.0185
Large-scale analysis of disease pathways in the human interactome
Discovering disease pathways, which can be defined as sets of proteins
associated with a given disease, is an important problem that has the potential
to provide clinically actionable insights for disease diagnosis, prognosis, and
treatment. Computational methods aid the discovery by relying on
protein-protein interaction (PPI) networks. They start with a few known
disease-associated proteins and aim to find the rest of the pathway by
exploring the PPI network around the known disease proteins. However, the
success of such methods has been limited, and failure cases have not been well
understood. Here we study the PPI network structure of 519 disease pathways. We
find that 90% of pathways do not correspond to single well-connected components
in the PPI network. Instead, proteins associated with a single disease tend to
form many separate connected components/regions in the network. We then
evaluate state-of-the-art disease pathway discovery methods and show that their
performance is especially poor on diseases with disconnected pathways. Thus, we
conclude that network connectivity structure alone may not be sufficient for
disease pathway discovery. However, we show that higher-order network
structures, such as small subgraphs of the pathway, provide a promising
direction for the development of new methods
- …