19 research outputs found
Exploring the structure and function of temporal networks with dynamic graphlets
With the growing amount of available temporal real-world network data, an
important question is how to efficiently study these data. One can simply model
a temporal network as either a single aggregate static network, or as a series
of time-specific snapshots, each of which is an aggregate static network over
the corresponding time window. The advantage of modeling the temporal data in
these two ways is that one can use existing well established methods for static
network analysis to study the resulting aggregate network(s). Here, we develop
a novel approach for studying temporal network data more explicitly. We base
our methodology on the well established notion of graphlets (subgraphs), which
have been successfully used in numerous contexts in static network research.
Here, we take the notion of static graphlets to the next level and develop new
theory needed to allow for graphlet-based analysis of temporal networks. Our
new notion of dynamic graphlets is quite different than existing approaches for
dynamic network analysis that are based on temporal motifs (statistically
significant subgraphs). Namely, these approaches suffer from many limitations.
For example, they can only deal with subgraph structures of limited complexity.
Also, their major drawback is that their results heavily depend on the choice
of a null network model that is required to evaluate the significance of a
subgraph. However, choosing an appropriate null network model is a non-trivial
task. Our dynamic graphlet approach overcomes the limitations of the existing
temporal motif-based approaches. At the same time, when we thoroughly evaluate
the ability of our new approach to characterize the structure and function of
an entire temporal network or of individual nodes, we find that the dynamic
graphlet approach outperforms the static graphlet approach, which indicates
that accounting for temporal information helps
Network comparison using directed graphlets
With recent advances in high-throughput cell biology the amount of cellular
biological data has grown drastically. Such data is often modeled as graphs
(also called networks) and studying them can lead to new insights into
molecule-level organization. A possible way to understand their structure is by
analysing the smaller components that constitute them, namely network motifs
and graphlets. Graphlets are particularly well suited to compare networks and
to assess their level of similarity but are almost always used as small
undirected graphs of up to five nodes, thus limiting their applicability in
directed networks. However, a large set of interesting biological networks such
as metabolic, cell signaling or transcriptional regulatory networks are
intrinsically directional, and using metrics that ignore edge direction may
gravely hinder information extraction. The applicability of graphlets is
extended to directed networks by considering the edge direction of the
graphlets. We tested our approach on a set of directed biological networks and
verified that they were correctly grouped by type using directed graphlets.
However, enumerating all graphlets in a large network is a computationally
demanding task. Our implementation addresses this concern by using a
state-of-the-art data structure, the g-trie, which is able to greatly reduce
the necessary computation. We compared our tool, gtrieScanner, to other
state-of-the art methods and verified that it is the fastest general tool for
graphlet counting.Comment: 9 page
Temporal Network Comparison using Graphlet-orbit Transitions
Networks are widely used to model real-world systems and uncover their
topological features. Network properties such as the degree distribution and
shortest path length have been computed in numerous real-world networks, and
most of them have been shown to be both scale-free and small-world networks.
Graphlets and network motifs are subgraph patterns that capture richer
structural information than aforementioned global network properties, and these
local features are often used for network comparison. However, past work on
graphlets and network motifs is almost exclusively applicable only for static
networks. Many systems are better represented as temporal networks which depict
not only how a system was at a given stage but also how they evolved.
Time-dependent information is crucial in temporal networks and, by disregarding
that data, static methods can not achieve the best possible results. This paper
introduces an extension of graphlets for temporal networks. Our proposed method
enumerates all 4-node graphlet-orbits in each network-snapshot, building the
corresponding orbit-transition matrix in the process. Our hypothesis is that
networks representing similar systems have characteristic orbit transitions
which better identify them than simple static patterns, and this is assessed on
a set of real temporal networks split into categories. In order to perform
temporal network comparison we put forward an orbit-transition-agreement metric
(OTA). OTA correctly groups a set of temporal networks that both static network
motifs and graphlets fail to do so adequately. Furthermore, our method produces
interpretable results which we use to uncover characteristic orbit transitions,
and that can be regarded as a network-fingerprint
Improving supervised prediction of aging-related genes via dynamic network analysis
This study focuses on supervised prediction of aging-related genes from
-omics data. Unlike gene expression methods that capture aging-specific
information but study genes in isolation, or protein-protein interaction (PPI)
network methods that account for PPIs but the PPIs are context-unspecific, we
recently integrated the two data types into an aging-specific PPI subnetwork,
which yielded more accurate aging-related gene predictions. However, a dynamic
aging-specific subnetwork did improve prediction performance compared to a
static aging-specific subnetwork, despite the aging process being dynamic. So,
here, we propose computational advances towards improving prediction accuracy
from a dynamic aging-specific subnetwork. We develop a supervised learning
model that when applied to a dynamic subnetwork yields extremely high
prediction performance, with F-score of 91.4%, while the best model on any
static subnetwork yields F-score of "only" 74.3%. Hence, our predictive model
could guide with high confidence the discovery of novel aging-related gene
candidates for future wet lab validation
A sampling framework for counting temporal motifs
Pattern counting in graphs is fundamental to network science tasks, and there
are many scalable methods for approximating counts of small patterns, often
called motifs, in large graphs. However, modern graph datasets now contain
richer structure, and incorporating temporal information in particular has
become a critical part of network analysis. Temporal motifs, which are
generalizations of small subgraph patterns that incorporate temporal ordering
on edges, are an emerging part of the network analysis toolbox. However, there
are no algorithms for fast estimation of temporal motifs counts; moreover, we
show that even counting simple temporal star motifs is NP-complete. Thus, there
is a need for fast and approximate algorithms. Here, we present the first
frequency estimation algorithms for counting temporal motifs. More
specifically, we develop a sampling framework that sits as a layer on top of
existing exact counting algorithms and enables fast and accurate
memory-efficient estimates of temporal motif counts. Our results show that we
can achieve one to two orders of magnitude speedups with minimal and
controllable loss in accuracy on a number of datasets.Comment: 9 pages, 4 figure
Graphlets versus node2vec and struc2vec in the task of network alignment
Network embedding aims to represent each node in a network as a
low-dimensional feature vector that summarizes the given node's (extended)
network neighborhood. The nodes' feature vectors can then be used in various
downstream machine learning tasks. Recently, many embedding methods that
automatically learn the features of nodes have emerged, such as node2vec and
struc2vec, which have been used in tasks such as node classification, link
prediction, and node clustering, mainly in the social network domain. There are
also other embedding methods that explicitly look at the connections between
nodes, i.e., the nodes' network neighborhoods, such as graphlets. Graphlets
have been used in many tasks such as network comparison, link prediction, and
network clustering, mainly in the computational biology domain. Even though the
two types of embedding methods (node2vec/struct2vec versus graphlets) have a
similar goal -- to represent nodes as features vectors, no comparisons have
been made between them, possibly because they have originated in the different
domains. Therefore, in this study, we compare graphlets to node2vec and
struc2vec, and we do so in the task of network alignment. In evaluations on
synthetic and real-world biological networks, we find that graphlets are both
more accurate and faster than node2vec and struc2vec
Supervised prediction of aging-related genes from a context-specific protein interaction subnetwork
Background. Human aging is linked to many prevalent diseases. The aging
process is highly influenced by genetic factors. Hence, it is important to
identify human aging-related genes. We focus on supervised prediction of such
genes. Gene expression-based methods for this purpose study genes in isolation
from each other. While protein-protein interaction (PPI) network-based methods
for this purpose account for interactions between genes' protein products,
current PPI network data are context-unspecific, spanning different biological
conditions. Instead, here, we focus on an aging-specific subnetwork of the
entire PPI network, obtained by integrating aging-specific gene expression data
and PPI network data. The potential of such data integration has been
recognized but mostly in the context of cancer. So, we are the first to propose
a supervised learning framework for predicting aging-related genes from an
aging-specific PPI subnetwork.
Results. In a systematic and comprehensive evaluation, we find that in many
of the evaluation tests: (i) using an aging-specific subnetwork indeed yields
more accurate aging-related gene predictions than using the entire network, and
(ii) predictive methods from our framework that have not previously been used
for supervised prediction of aging-related genes outperform existing prominent
methods for the same purpose.
Conclusion. These results justify the need for our framework.Comment: This is a Journal extension of "10.1109/BIBM47256.2019.8983063". So
we use the same title as our conference pape
On the Enumeration of Maximal -Cliques of a Temporal Network
A temporal network is a mathematical way of precisely representing a time
varying relationship among a group of agents. In this paper, we introduce the
notion of -Cliques of a temporal network, where every pair of
vertices present in the clique communicates atleast times in each
period within a given time duration. We present an algorithm for
enumerating all such maximal cliques present in the network. We also implement
the proposed algorithm with three human contact network data sets. Based on the
obtained results, we analyze the data set on multiple values of and
, which helps in finding out contact groups with different frequencies.Comment: 9 pages. Both the authors have done equal contributions in this wor
Network-based protein structural classification
Experimental determination of protein function is resource-consuming. As an
alternative, computational prediction of protein function has received
attention. In this context, protein structural classification (PSC) can help,
by allowing for determining structural classes of currently unclassified
proteins based on their features, and then relying on the fact that proteins
with similar structures have similar functions. Existing PSC approaches rely on
sequence-based or direct 3-dimensional (3D) structure-based protein features.
In contrast, we first model 3D structures of proteins as protein structure
networks (PSNs). Then, we use network-based features for PSC. We propose the
use of graphlets, state-of-the-art features in many research areas of network
science, in the task of PSC. Moreover, because graphlets can deal only with
unweighted PSNs, and because accounting for edge weights when constructing PSNs
could improve PSC accuracy, we also propose a deep learning framework that
automatically learns network features from weighted PSNs. When evaluated on a
large set of ~9,400 CATH and ~12,800 SCOP protein domains (spanning 36 PSN
sets), our proposed approaches are superior to existing PSC approaches in terms
of accuracy, with comparable running time
ITeM: Independent Temporal Motifs to Summarize and Compare Temporal Networks
Networks are a fundamental and flexible way of representing various complex
systems. Many domains such as communication, citation, procurement, biology,
social media, and transportation can be modeled as a set of entities and their
relationships. Temporal networks are a specialization of general networks where
the temporal evolution of the system is as important to understand as the
structure of the entities and relationships. We present the Independent
Temporal Motif (ITeM) to characterize temporal graphs from different domains.
The ITeMs are edge-disjoint temporal motifs that can be used to model the
structure and the evolution of the graph. For a given temporal graph, we
produce a feature vector of ITeM frequencies and apply this distribution to the
task of measuring the similarity of temporal graphs. We show that ITeM has
higher accuracy than other motif frequency-based approaches. We define various
metrics based on ITeM that reveal salient properties of a temporal network. We
also present importance sampling as a method for efficiently estimating the
ITeM counts. We evaluate our approach on both synthetic and real temporal
networks