172,631 research outputs found

    Network Maximal Correlation

    Get PDF
    Identifying nonlinear relationships in large datasets is a daunting task particularly when the form of the nonlinearity is unknown. Here, we introduce Network Maximal Correlation (NMC) as a fundamental measure to capture nonlinear associations in networks without the knowledge of underlying nonlinearity shapes. NMC infers, possibly nonlinear, transformations of variables with zero means and unit variances by maximizing total nonlinear correlation over the underlying network. For the case of having two variables, NMC is equivalent to the standard Maximal Correlation. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for both discrete and jointly Gaussian variables. For discrete random variables, we show that the NMC optimization is an instance of the Maximum Correlation Problem and provide necessary conditions for its global optimal solution. Moreover, we propose an efficient algorithm based on Alternating Conditional Expectation (ACE) which converges to a local NMC optimum. For this algorithm, we provide guidelines for choosing appropriate starting points to jump out of local maximizers. We also propose a distributed algorithm to compute a 1-ϵ\epsilon approximation of the NMC value for large and dense graphs using graph partitioning. For jointly Gaussian variables, under some conditions, we show that the NMC optimization can be simplified to a Max-Cut problem, where we provide conditions under which an NMC solution can be computed exactly. Under some general conditions, we show that NMC can infer the underlying graphical model for functions of latent jointly Gaussian variables. These functions are unknown, bijective, and can be nonlinear. This result broadens the family of continuous distributions whose graphical models can be characterized efficiently. We illustrate the robustness of NMC in real world applications by showing its continuity with respect to small perturbations of joint distributions. We also show that sample NMC (NMC computed using empirical distributions) converges exponentially fast to the true NMC value. Finally, we apply NMC to different cancer datasets including breast, kidney and liver cancers, and show that NMC infers gene modules that are significantly associated with survival times of individuals while they are not detected using linear association measures

    Structural constraints in complex networks

    Get PDF
    We present a link rewiring mechanism to produce surrogates of a network where both the degree distribution and the rich--club connectivity are preserved. We consider three real networks, the AS--Internet, the protein interaction and the scientific collaboration. We show that for a given degree distribution, the rich--club connectivity is sensitive to the degree--degree correlation, and on the other hand the degree--degree correlation is constrained by the rich--club connectivity. In particular, in the case of the Internet, the assortative coefficient is always negative and a minor change in its value can reverse the network's rich--club structure completely; while fixing the degree distribution and the rich--club connectivity restricts the assortative coefficient to such a narrow range, that a reasonable model of the Internet can be produced by considering mainly the degree distribution and the rich--club connectivity. We also comment on the suitability of using the maximal random network as a null model to assess the rich--club connectivity in real networks.Comment: To appear in New Journal of Physics (www.njp.org

    Magnetic models on Apollonian networks

    Full text link
    Thermodynamic and magnetic properties of Ising models defined on the triangular Apollonian network are investigated. This and other similar networks are inspired by the problem of covering an Euclidian domain with circles of maximal radii. Maps for the thermodynamic functions in two subsequent generations of the construction of the network are obtained by formulating the problem in terms of transfer matrices. Numerical iteration of this set of maps leads to exact values for the thermodynamic properties of the model. Different choices for the coupling constants between only nearest neighbors along the lattice are taken into account. For both ferromagnetic and anti-ferromagnetic constants, long range magnetic ordering is obtained. With exception of a size dependent effective critical behavior of the correlation length, no evidence of asymptotic criticality was detected.Comment: 21 pages, 5 figure

    Comparison of threshold selection methods for microarray gene co-expression matrices

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Network and clustering analyses of microarray co-expression correlation data often require application of a threshold to discard small correlations, thus reducing computational demands and decreasing the number of uninformative correlations. This study investigated threshold selection in the context of combinatorial network analysis of transcriptome data.</p> <p>Findings</p> <p>Six conceptually diverse methods - based on number of maximal cliques, correlation of control spots with expressed genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values, and statistical power - were used to estimate a correlation threshold for three time-series microarray datasets. The validity of thresholds was tested by comparison to thresholds derived from Gene Ontology information. Stability and reliability of the best methods were evaluated with block bootstrapping.</p> <p>Two threshold methods, number of maximal cliques and spectral graph, used information in the correlation matrix structure and performed well in terms of stability. Comparison to Gene Ontology found thresholds from number of maximal cliques extracted from a co-expression matrix were the most biologically valid. Approaches to improve both methods were suggested.</p> <p>Conclusion</p> <p>Threshold selection approaches based on network structure of gene relationships gave thresholds with greater relevance to curated biological relationships than approaches based on statistical pair-wise relationships.</p
    corecore