32,120 research outputs found

    Corrected overlap weight and clustering coefficient

    Full text link
    We discuss two well known network measures: the overlap weight of an edge and the clustering coefficient of a node. For both of them it turns out that they are not very useful for data analytic task to identify important elements (nodes or links) of a given network. The reason for this is that they attain their largest values on maximal subgraphs of relatively small size that are more probable to appear in a network than that of larger size. We show how the definitions of these measures can be corrected in such a way that they give the expected results. We illustrate the proposed corrected measures by applying them on the US Airports network using the program Pajek.Comment: The paper is a detailed and extended version of the talk presented at the CMStatistics (ERCIM) 2015 Conferenc

    Network inference and community detection, based on covariance matrices, correlations and test statistics from arbitrary distributions

    Get PDF
    In this paper we propose methodology for inference of binary-valued adjacency matrices from various measures of the strength of association between pairs of network nodes, or more generally pairs of variables. This strength of association can be quantified by sample covariance and correlation matrices, and more generally by test-statistics and hypothesis test p-values from arbitrary distributions. Community detection methods such as block modelling typically require binary-valued adjacency matrices as a starting point. Hence, a main motivation for the methodology we propose is to obtain binary-valued adjacency matrices from such pairwise measures of strength of association between variables. The proposed methodology is applicable to large high-dimensional data-sets and is based on computationally efficient algorithms. We illustrate its utility in a range of contexts and data-sets

    Characterizing unknown systematics in large scale structure surveys

    Get PDF
    Photometric large scale structure (LSS) surveys probe the largest volumes in the Universe, but are inevitably limited by systematic uncertainties. Imperfect photometric calibration leads to biases in our measurements of the density fields of LSS tracers such as galaxies and quasars, and as a result in cosmological parameter estimation. Earlier studies have proposed using cross-correlations between different redshift slices or cross-correlations between different surveys to reduce the effects of such systematics. In this paper we develop a method to characterize unknown systematics. We demonstrate that while we do not have sufficient information to correct for unknown systematics in the data, we can obtain an estimate of their magnitude. We define a parameter to estimate contamination from unknown systematics using cross-correlations between different redshift slices and propose discarding bins in the angular power spectrum that lie outside a certain contamination tolerance level. We show that this method improves estimates of the bias using simulated data and further apply it to photometric luminous red galaxies in the Sloan Digital Sky Survey as a case study.Comment: 24 pages, 6 figures; Expanded discussion of results, added figure 2; Version to be published in JCA

    Universal scaling of distances in complex networks

    Full text link
    Universal scaling of distances between vertices of Erdos-Renyi random graphs, scale-free Barabasi-Albert models, science collaboration networks, biological networks, Internet Autonomous Systems and public transport networks are observed. A mean distance between two nodes of degrees k_i and k_j equals to =A-B log(k_i k_j). The scaling is valid over several decades. A simple theory for the appearance of this scaling is presented. Parameters A and B depend on the mean value of a node degree _nn calculated for the nearest neighbors and on network clustering coefficients.Comment: 4 pages, 3 figures, 1 tabl
    corecore