32,120 research outputs found
Corrected overlap weight and clustering coefficient
We discuss two well known network measures: the overlap weight of an edge and
the clustering coefficient of a node. For both of them it turns out that they
are not very useful for data analytic task to identify important elements
(nodes or links) of a given network. The reason for this is that they attain
their largest values on maximal subgraphs of relatively small size that are
more probable to appear in a network than that of larger size. We show how the
definitions of these measures can be corrected in such a way that they give the
expected results. We illustrate the proposed corrected measures by applying
them on the US Airports network using the program Pajek.Comment: The paper is a detailed and extended version of the talk presented at
the CMStatistics (ERCIM) 2015 Conferenc
Network inference and community detection, based on covariance matrices, correlations and test statistics from arbitrary distributions
In this paper we propose methodology for inference of binary-valued adjacency
matrices from various measures of the strength of association between pairs of
network nodes, or more generally pairs of variables. This strength of
association can be quantified by sample covariance and correlation matrices,
and more generally by test-statistics and hypothesis test p-values from
arbitrary distributions. Community detection methods such as block modelling
typically require binary-valued adjacency matrices as a starting point. Hence,
a main motivation for the methodology we propose is to obtain binary-valued
adjacency matrices from such pairwise measures of strength of association
between variables. The proposed methodology is applicable to large
high-dimensional data-sets and is based on computationally efficient
algorithms. We illustrate its utility in a range of contexts and data-sets
Characterizing unknown systematics in large scale structure surveys
Photometric large scale structure (LSS) surveys probe the largest volumes in
the Universe, but are inevitably limited by systematic uncertainties. Imperfect
photometric calibration leads to biases in our measurements of the density
fields of LSS tracers such as galaxies and quasars, and as a result in
cosmological parameter estimation. Earlier studies have proposed using
cross-correlations between different redshift slices or cross-correlations
between different surveys to reduce the effects of such systematics. In this
paper we develop a method to characterize unknown systematics. We demonstrate
that while we do not have sufficient information to correct for unknown
systematics in the data, we can obtain an estimate of their magnitude. We
define a parameter to estimate contamination from unknown systematics using
cross-correlations between different redshift slices and propose discarding
bins in the angular power spectrum that lie outside a certain contamination
tolerance level. We show that this method improves estimates of the bias using
simulated data and further apply it to photometric luminous red galaxies in the
Sloan Digital Sky Survey as a case study.Comment: 24 pages, 6 figures; Expanded discussion of results, added figure 2;
Version to be published in JCA
Universal scaling of distances in complex networks
Universal scaling of distances between vertices of Erdos-Renyi random graphs,
scale-free Barabasi-Albert models, science collaboration networks, biological
networks, Internet Autonomous Systems and public transport networks are
observed. A mean distance between two nodes of degrees k_i and k_j equals to
=A-B log(k_i k_j). The scaling is valid over several decades. A simple
theory for the appearance of this scaling is presented. Parameters A and B
depend on the mean value of a node degree _nn calculated for the nearest
neighbors and on network clustering coefficients.Comment: 4 pages, 3 figures, 1 tabl
- …