52 research outputs found
Unbiased sampling of network ensembles
Sampling random graphs with given properties is a key step in the analysis of
networks, as random ensembles represent basic null models required to identify
patterns such as communities and motifs. An important requirement is that the
sampling process is unbiased and efficient. The main approaches are
microcanonical, i.e. they sample graphs that match the enforced constraints
exactly. Unfortunately, when applied to strongly heterogeneous networks (like
most real-world examples), the majority of these approaches become biased
and/or time-consuming. Moreover, the algorithms defined in the simplest cases,
such as binary graphs with given degrees, are not easily generalizable to more
complicated ensembles. Here we propose a solution to the problem via the
introduction of a "Maximize and Sample" ("Max & Sam" for short) method to
correctly sample ensembles of networks where the constraints are `soft', i.e.
realized as ensemble averages. Our method is based on exact maximum-entropy
distributions and is therefore unbiased by construction, even for strongly
heterogeneous networks. It is also more computationally efficient than most
microcanonical alternatives. Finally, it works for both binary and weighted
networks with a variety of constraints, including combined degree-strength
sequences and full reciprocity structure, for which no alternative method
exists. Our canonical approach can in principle be turned into an unbiased
microcanonical one, via a restriction to the relevant subset. Importantly, the
analysis of the fluctuations of the constraints suggests that the
microcanonical and canonical versions of all the ensembles considered here are
not equivalent. We show various real-world applications and provide a code
implementing all our algorithms.Comment: MatLab code available at
http://www.mathworks.it/matlabcentral/fileexchange/46912-max-sam-package-zi
Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys
Given their importance in shaping social networks and determining how
information or diseases propagate in a population, human interactions are the
subject of many data collection efforts. To this aim, different methods are
commonly used, from diaries and surveys to wearable sensors. These methods show
advantages and limitations but are rarely compared in a given setting. As
surveys targeting friendship relations might suffer less from memory biases
than contact diaries, it is also interesting to explore how daily contact
patterns compare with friendship relations and with online social links. Here
we make progresses in these directions by leveraging data from a French high
school: face-to-face contacts measured by two concurrent methods, sensors and
diaries; self-reported friendship surveys; Facebook links. We compare the data
sets and find that most short contacts are not reported in diaries while long
contacts have larger reporting probability, with a general tendency to
overestimate durations. Measured contacts corresponding to reported friendship
can have durations of any length but all long contacts correspond to reported
friendships. Online links not associated to reported friendships correspond to
short face-to-face contacts, highlighting the different nature of reported
friendships and online links. Diaries and surveys suffer from a low sampling
rate, showing the higher acceptability of sensor-based platform. Despite the
biases, we found that the overall structure of the contact network, i.e., the
mixing patterns between classes, is correctly captured by both self-reported
contacts and friendships networks. Overall, diaries and surveys tend to yield a
correct picture of the structural organization of the contact network, albeit
with much less links, and give access to a sort of backbone of the contact
network corresponding to the strongest links in terms of cumulative durations
Enhanced reconstruction of weighted networks from strengths and degrees
Network topology plays a key role in many phenomena, from the spreading of
diseases to that of financial crises. Whenever the whole structure of a network
is unknown, one must resort to reconstruction methods that identify the least
biased ensemble of networks consistent with the partial information available.
A challenging case, frequently encountered due to privacy issues in the
analysis of interbank flows and Big Data, is when there is only local
(node-specific) aggregate information available. For binary networks, the
relevant ensemble is one where the degree (number of links) of each node is
constrained to its observed value. However, for weighted networks the problem
is much more complicated. While the naive approach prescribes to constrain the
strengths (total link weights) of all nodes, recent counter-intuitive results
suggest that in weighted networks the degrees are often more informative than
the strengths. This implies that the reconstruction of weighted networks would
be significantly enhanced by the specification of both strengths and degrees, a
computationally hard and bias-prone procedure. Here we solve this problem by
introducing an analytical and unbiased maximum-entropy method that works in the
shortest possible time and does not require the explicit generation of
reconstructed samples. We consider several real-world examples and show that,
while the strengths alone give poor results, the additional knowledge of the
degrees yields accurately reconstructed networks. Information-theoretic
criteria rigorously confirm that the degree sequence, as soon as it is
non-trivial, is irreducible to the strength sequence. Our results have strong
implications for the analysis of motifs and communities and whenever the
reconstructed ensemble is required as a null model to detect higher-order
patterns
Reconstructing the world trade multiplex: the role of intensive and extensive biases
In economic and financial networks, the strength of each node has always an
important economic meaning, such as the size of supply and demand, import and
export, or financial exposure. Constructing null models of networks matching
the observed strengths of all nodes is crucial in order to either detect
interesting deviations of an empirical network from economically meaningful
benchmarks or reconstruct the most likely structure of an economic network when
the latter is unknown. However, several studies have proved that real economic
networks and multiplexes are topologically very different from configurations
inferred only from node strengths. Here we provide a detailed analysis of the
World Trade Multiplex by comparing it to an enhanced null model that
simultaneously reproduces the strength and the degree of each node. We study
several temporal snapshots and almost one hundred layers (commodity classes) of
the multiplex and find that the observed properties are systematically well
reproduced by our model. Our formalism allows us to introduce the (static)
concept of extensive and intensive bias, defined as a measurable tendency of
the network to prefer either the formation of extra links or the reinforcement
of link weights, with respect to a reference case where only strengths are
enforced. Our findings complement the existing economic literature on (dynamic)
intensive and extensive trade margins. More in general, they show that
real-world multiplexes can be strongly shaped by layer-specific local
constraints
How to estimate epidemic risk from incomplete contact diaries data?
Social interactions shape the patterns of spreading processes in a population. Techniques such as diaries or proximity sensors allow to collect data about encounters and to build networks of contacts
between individuals. The contact networks obtained from these different techniques are however quantitatively different. Here, we first show how these discrepancies affect the prediction of the
epidemic risk when these data are fed to numerical models of epidemic spread: low participation rate, under-reporting of contacts and overestimation of contact durations in contact diaries with
respect to sensor data determine indeed important differences in the outcomes of the corresponding simulations with for instance an enhanced sensitivity to initial conditions. Most importantly, we
investigate if and how information gathered from contact diaries can be used in such simulations in order to yield an accurate description of the epidemic risk, assuming that data from sensors represent the ground truth. The contact networks built from contact sensors and diaries present indeed several structural similarities: this suggests the possibility to construct, using only the contact diary network information, a surrogate contact network such that simulations using this surrogate network give the same estimation of the epidemic risk as simulations using the contact sensor network. We present and compare several methods to build such surrogate data, and show
that it is indeed possible to obtain a good agreement between the outcomes of simulations using surrogate and sensor data, as long as the contact diary information is complemented by publicly
available data describing the heterogeneity of the durations of human contacts
Reconstructing networks
Complex networks datasets often come with the problem of missing information:
interactions data that have not been measured or discovered, may be affected by
errors, or are simply hidden because of privacy issues. This Element provides
an overview of the ideas, methods and techniques to deal with this problem and
that together define the field of network reconstruction. Given the extent of
the subject, we shall focus on the inference methods rooted in statistical
physics and information theory. The discussion will be organized according to
the different scales of the reconstruction task, that is, whether the goal is
to reconstruct the macroscopic structure of the network, to infer its mesoscale
properties, or to predict the individual microscopic connections.Comment: 107 pages, 25 figure
Reconstructing networks
Complex networks datasets often come with the problem of missing information: interactions data that have not been measured or discovered, may be affected by errors, or are simply hidden because of privacy issues. This Element provides an overview of the ideas, methods and techniques to deal with this problem and that together define the field of network reconstruction. Given the extent of the subject, the authors focus on the inference methods rooted in statistical physics and information theory. The discussion is organized according to the different scales of the reconstruction task, that is, whether the goal is to reconstruct the macroscopic structure of the network, to infer its mesoscale properties, or to predict the individual microscopic connections
Spatio-temporal patterns of the international merger and acquisition network
This paper analyses the world web of mergers and acquisitions (M&As) using a complex network approach. We use data of M&As to build a temporal sequence of binary and weighted-directed networks for the period 1995-2010 and 224 countries (nodes) connected according to their M&As flows (links). We study different geographical and temporal aspects of the international M&A network (IMAN), building sequences of filtered sub-networks whose links belong to specific intervals of distance or time. Given that M&As and trade are complementary ways of reaching foreign markets, we perform our analysis using statistics employed for the study of the international trade network (ITN), highlighting the similarities and differences between the ITN and the IMAN. In contrast to the ITN, the IMAN is a low density network characterized by a persistent giant component with many external nodes and low reciprocity. Clustering patterns are very heterogeneous and dynamic. High-income economies are the main acquirers and are characterized by high connectivity, implying that most countries are targets of a few acquirers. Like in the ITN, geographical distance strongly impacts the structure of the IMAN: link-weights and node degrees have a non-linear relation with distance, and an assortative pattern is present at short distance
Intensive and extensive biases in economic networks: Reconstructing world trade
In economic and financial networks, the strength (total value of the connections) of a given node has always an important economic meaning, such as the size of supply and demand, import and export, or financial exposure. Constructing null models of networks matching the observed strengths of all nodes is crucial in order to either detect interesting deviations of an empirical network from economically meaningful benchmarks or reconstruct the most likely structure of an economic network when the latter is unknown. However, several studies have proved that real economic networks are topologically very different from networks inferred only from node strengths. Here we provide a detailed analysis for the World Trade Web (WTW) by comparing it to an enhanced null model that simultaneously reproduces the strength and the number of connections of each node. We study several temporal snapshots and different aggregation levels (commodity classes) of the WTW and systematically find that the observed properties are extremely well reproduced by our model. This allows us to introduce the concept of extensive and intensive bias, defined as a measurable tendency of the network to prefer either the formation of new links or the reinforcement of existing ones. We discuss the possible economic interpretation in terms of trade margins
- …