29,425 research outputs found

    Stability of graph communities across time scales

    Get PDF
    The complexity of biological, social and engineering networks makes it desirable to find natural partitions into communities that can act as simplified descriptions and provide insight into the structure and function of the overall system. Although community detection methods abound, there is a lack of consensus on how to quantify and rank the quality of partitions. We show here that the quality of a partition can be measured in terms of its stability, defined in terms of the clustered autocovariance of a Markov process taking place on the graph. Because the stability has an intrinsic dependence on time scales of the graph, it allows us to compare and rank partitions at each time and also to establish the time spans over which partitions are optimal. Hence the Markov time acts effectively as an intrinsic resolution parameter that establishes a hierarchy of increasingly coarser clusterings. Within our framework we can then provide a unifying view of several standard partitioning measures: modularity and normalized cut size can be interpreted as one-step time measures, whereas Fiedler's spectral clustering emerges at long times. We apply our method to characterize the relevance and persistence of partitions over time for constructive and real networks, including hierarchical graphs and social networks. We also obtain reduced descriptions for atomic level protein structures over different time scales.Comment: submitted; updated bibliography from v

    The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

    Get PDF
    Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

    The stability of a graph partition: A dynamics-based framework for community detection

    Full text link
    Recent years have seen a surge of interest in the analysis of complex networks, facilitated by the availability of relational data and the increasingly powerful computational resources that can be employed for their analysis. Naturally, the study of real-world systems leads to highly complex networks and a current challenge is to extract intelligible, simplified descriptions from the network in terms of relevant subgraphs, which can provide insight into the structure and function of the overall system. Sparked by seminal work by Newman and Girvan, an interesting line of research has been devoted to investigating modular community structure in networks, revitalising the classic problem of graph partitioning. However, modular or community structure in networks has notoriously evaded rigorous definition. The most accepted notion of community is perhaps that of a group of elements which exhibit a stronger level of interaction within themselves than with the elements outside the community. This concept has resulted in a plethora of computational methods and heuristics for community detection. Nevertheless a firm theoretical understanding of most of these methods, in terms of how they operate and what they are supposed to detect, is still lacking to date. Here, we will develop a dynamical perspective towards community detection enabling us to define a measure named the stability of a graph partition. It will be shown that a number of previously ad-hoc defined heuristics for community detection can be seen as particular cases of our method providing us with a dynamic reinterpretation of those measures. Our dynamics-based approach thus serves as a unifying framework to gain a deeper understanding of different aspects and problems associated with community detection and allows us to propose new dynamically-inspired criteria for community structure.Comment: 3 figures; published as book chapte

    Identifying networks with common organizational principles

    Full text link
    Many complex systems can be represented as networks, and the problem of network comparison is becoming increasingly relevant. There are many techniques for network comparison, from simply comparing network summary statistics to sophisticated but computationally costly alignment-based approaches. Yet it remains challenging to accurately cluster networks that are of a different size and density, but hypothesized to be structurally similar. In this paper, we address this problem by introducing a new network comparison methodology that is aimed at identifying common organizational principles in networks. The methodology is simple, intuitive and applicable in a wide variety of settings ranging from the functional classification of proteins to tracking the evolution of a world trade network.Comment: 26 pages, 7 figure

    Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications

    Get PDF
    Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins
    corecore