87 research outputs found
Organic Design of Massively Distributed Systems: A Complex Networks Perspective
The vision of Organic Computing addresses challenges that arise in the design
of future information systems that are comprised of numerous, heterogeneous,
resource-constrained and error-prone components or devices. Here, the notion
organic particularly highlights the idea that, in order to be manageable, such
systems should exhibit self-organization, self-adaptation and self-healing
characteristics similar to those of biological systems. In recent years, the
principles underlying many of the interesting characteristics of natural
systems have been investigated from the perspective of complex systems science,
particularly using the conceptual framework of statistical physics and
statistical mechanics. In this article, we review some of the interesting
relations between statistical physics and networked systems and discuss
applications in the engineering of organic networked computing systems with
predictable, quantifiable and controllable self-* properties.Comment: 17 pages, 14 figures, preprint of submission to Informatik-Spektrum
published by Springe
Counting Causal Paths in Big Times Series Data on Networks
Graph or network representations are an important foundation for data mining
and machine learning tasks in relational data. Many tools of network analysis,
like centrality measures, information ranking, or cluster detection rest on the
assumption that links capture direct influence, and that paths represent
possible indirect influence. This assumption is invalidated in time-stamped
network data capturing, e.g., dynamic social networks, biological sequences or
financial transactions. In such data, for two time-stamped links (A,B) and
(B,C) the chronological ordering and timing determines whether a causal path
from node A via B to C exists. A number of works has shown that for that reason
network analysis cannot be directly applied to time-stamped network data.
Existing methods to address this issue require statistics on causal paths,
which is computationally challenging for big data sets.
Addressing this problem, we develop an efficient algorithm to count causal
paths in time-stamped network data. Applying it to empirical data, we show that
our method is more efficient than a baseline method implemented in an
OpenSource data analytics package. Our method works efficiently for different
values of the maximum time difference between consecutive links of a causal
path and supports streaming scenarios. With it, we are closing a gap that
hinders an efficient analysis of big time series data on complex networks.Comment: 10 pages, 2 figure
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
Higher-Order Aggregate Networks in the Analysis of Temporal Networks: Path structures and centralities
Recent research on temporal networks has highlighted the limitations of a
static network perspective for our understanding of complex systems with
dynamic topologies. In particular, recent works have shown that i) the specific
order in which links occur in real-world temporal networks affects causality
structures and thus the evolution of dynamical processes, and ii) higher-order
aggregate representations of temporal networks can be used to analytically
study the effect of these order correlations on dynamical processes. In this
article we analyze the effect of order correlations on path-based centrality
measures in real-world temporal networks. Analyzing temporal equivalents of
betweenness, closeness and reach centrality in six empirical temporal networks,
we first show that an analysis of the commonly used static, time-aggregated
representation can give misleading results about the actual importance of
nodes. We further study higher-order time-aggregated networks, a recently
proposed generalization of the commonly applied static, time-aggregated
representation of temporal networks. Here, we particularly define path-based
centrality measures based on second-order aggregate networks, empirically
validating that node centralities calculated in this way better capture the
true temporal centralities of nodes than node centralities calculated based on
the commonly used static (first-order) representation. Apart from providing a
simple and practical method for the approximation of path-based centralities in
temporal networks, our results highlight interesting perspectives for the use
of higher-order aggregate networks in the analysis of time-stamped network
data.Comment: 27 pages, 13 figures, 3 table
Understanding Complex Systems: From Networks to Optimal Higher-Order Models
To better understand the structure and function of complex systems,
researchers often represent direct interactions between components in complex
systems with networks, assuming that indirect influence between distant
components can be modelled by paths. Such network models assume that actual
paths are memoryless. That is, the way a path continues as it passes through a
node does not depend on where it came from. Recent studies of data on actual
paths in complex systems question this assumption and instead indicate that
memory in paths does have considerable impact on central methods in network
science. A growing research community working with so-called higher-order
network models addresses this issue, seeking to take advantage of information
that conventional network representations disregard. Here we summarise the
progress in this area and outline remaining challenges calling for more
research.Comment: 8 pages, 4 figure
An ensemble perspective on multi-layer networks
We study properties of multi-layered, interconnected networks from an
ensemble perspective, i.e. we analyze ensembles of multi-layer networks that
share similar aggregate characteristics. Using a diffusive process that evolves
on a multi-layer network, we analyze how the speed of diffusion depends on the
aggregate characteristics of both intra- and inter-layer connectivity. Through
a block-matrix model representing the distinct layers, we construct transition
matrices of random walkers on multi-layer networks, and estimate expected
properties of multi-layer networks using a mean-field approach. In addition, we
quantify and explore conditions on the link topology that allow to estimate the
ensemble average by only considering aggregate statistics of the layers. Our
approach can be used when only partial information is available, like it is
usually the case for real-world multi-layer complex systems
Using Causality-Aware Graph Neural Networks to Predict Temporal Centralities in Dynamic Graphs
Node centralities play a pivotal role in network science, social network
analysis, and recommender systems. In temporal data, static path-based
centralities like closeness or betweenness can give misleading results about
the true importance of nodes in a temporal graph. To address this issue,
temporal generalizations of betweenness and closeness have been defined that
are based on the shortest time-respecting paths between pairs of nodes.
However, a major issue of those generalizations is that the calculation of such
paths is computationally expensive. Addressing this issue, we study the
application of De Bruijn Graph Neural Networks (DBGNN), a causality-aware graph
neural network architecture, to predict temporal path-based centralities in
time series data. We experimentally evaluate our approach in 13 temporal graphs
from biological and social systems and show that it considerably improves the
prediction of both betweenness and closeness centrality compared to a static
Graph Convolutional Neural Network
Flow Divergence: Comparing Maps of Flows with Relative Entropy
Networks represent how the entities of a system are connected and can be
partitioned differently, prompting ways to compare partitions. Common
approaches for comparing network partitions include information-theoretic
measures based on mutual information and set-theoretic measures such as the
Jaccard index. These measures are often based on computing the agreement in
terms of overlap between different partitions of the same set. However, they
ignore link patterns which are essential for the organisation of networks. We
propose flow divergence, an information-theoretic divergence measure for
comparing network partitions, inspired by the ideas behind the Kullback-Leibler
divergence and the map equation for community detection. Similar to the
Kullback-Leibler divergence, flow divergence adopts a coding perspective and
compares two network partitions and by
considering the expected extra number of bits required to describe a random
walk on a network using relative to reference partition
. Because flow divergence is based on random walks, it can be
used to compare partitions with arbitrary and different depths. We show that
flow divergence distinguishes between partitions that traditional measures
consider to be equally good when compared to a reference partition. Applied to
real networks, we use flow divergence to estimate the cost of overfitting in
incomplete networks and to visualise the solution landscape of network
partitions
- …