16 research outputs found

    Complexity of Networks II: The Set Complexity of Edge-Colored Graphs

    No full text
    We previously introduced the concept of “set-complexity”, based on a context-dependent measure of information, and used this concept to describe the complexity of gene interaction networks. In the previous paper in this series we analyzed the set-complexity of binary graphs. Here we extend this analysis to graphs with multi-colored edges that more closely match biological structures like the gene interaction networks. All highly complex graphs by this measure exhibit a modular structure. A principal result of this work is that for the most complex graphs of a given size the number of edge colors is equal to the number of “modules” of the graph. Complete multipartite graphs (CMGs) are defined and analyzed, and the relation between complexity and structure of these graphs is examined in detail. We establish that the mutual information between any two nodes in a CMG can be fully expressed in terms of entropy, and present an explicit expression for the set complexity of CMGs (Theorem 3). An algorithm for generating highly complex graphs from CMGs is described. We establish several theorems relating these concepts and connecting complex graphs with a variety of practical network properties. In exploring the relation between symmetry and complexity we use the idea of a similarity matrix and its spectrum for highly complex graphs

    New methods for finding associations in large data sets: Generalizing the maximal information coefficient (MIC)

    Get PDF
    We propose here a natural, but substantive, extension of the MIC. Defined for two variables, MIC has a distinct advance for detecting potentially complex dependencies. Our extension provides a similar means for dependencies among three variables. This itself is an important step for practical applications. We show that by merging two concepts, the interaction information, which is a generalization of the mutual information to three variables, and the normalized information distance, which measures informational sharing between two variables, we can extend the fundamental idea of MIC. Our results also exhibit some attractive properties that should be useful for practical applications in data analysis. Finally, the conceptual and mathematical framework presented here can be used to generalize the idea of MIC to the multi-variable case
    corecore