4,495 research outputs found

    Statistical Mechanics of Community Detection

    Full text link
    Starting from a general \textit{ansatz}, we show how community detection can be interpreted as finding the ground state of an infinite range spin glass. Our approach applies to weighted and directed networks alike. It contains the \textit{at hoc} introduced quality function from \cite{ReichardtPRL} and the modularity QQ as defined by Newman and Girvan \cite{Girvan03} as special cases. The community structure of the network is interpreted as the spin configuration that minimizes the energy of the spin glass with the spin states being the community indices. We elucidate the properties of the ground state configuration to give a concise definition of communities as cohesive subgroups in networks that is adaptive to the specific class of network under study. Further we show, how hierarchies and overlap in the community structure can be detected. Computationally effective local update rules for optimization procedures to find the ground state are given. We show how the \textit{ansatz} may be used to discover the community around a given node without detecting all communities in the full network and we give benchmarks for the performance of this extension. Finally, we give expectation values for the modularity of random graphs, which can be used in the assessment of statistical significance of community structure

    Multifractal Network Generator

    Full text link
    We introduce a new approach to constructing networks with realistic features. Our method, in spite of its conceptual simplicity (it has only two parameters) is capable of generating a wide variety of network types with prescribed statistical properties, e.g., with degree- or clustering coefficient distributions of various, very different forms. In turn, these graphs can be used to test hypotheses, or, as models of actual data. The method is based on a mapping between suitably chosen singular measures defined on the unit square and sparse infinite networks. Such a mapping has the great potential of allowing for graph theoretical results for a variety of network topologies. The main idea of our approach is to go to the infinite limit of the singular measure and the size of the corresponding graph simultaneously. A very unique feature of this construction is that the complexity of the generated network is increasing with the size. We present analytic expressions derived from the parameters of the -- to be iterated-- initial generating measure for such major characteristics of graphs as their degree, clustering coefficient and assortativity coefficient distributions. The optimal parameters of the generating measure are determined from a simple simulated annealing process. Thus, the present work provides a tool for researchers from a variety of fields (such as biology, computer science, biology, or complex systems) enabling them to create a versatile model of their network data.Comment: Preprint. Final version appeared in PNAS

    Scaling Nonparametric Bayesian Inference via Subsample-Annealing

    Full text link
    We describe an adaptation of the simulated annealing algorithm to nonparametric clustering and related probabilistic models. This new algorithm learns nonparametric latent structure over a growing and constantly churning subsample of training data, where the portion of data subsampled can be interpreted as the inverse temperature beta(t) in an annealing schedule. Gibbs sampling at high temperature (i.e., with a very small subsample) can more quickly explore sketches of the final latent state by (a) making longer jumps around latent space (as in block Gibbs) and (b) lowering energy barriers (as in simulated annealing). We prove subsample annealing speeds up mixing time N^2 -> N in a simple clustering model and exp(N) -> N in another class of models, where N is data size. Empirically subsample-annealing outperforms naive Gibbs sampling in accuracy-per-wallclock time, and can scale to larger datasets and deeper hierarchical models. We demonstrate improved inference on million-row subsamples of US Census data and network log data and a 307-row hospital rating dataset, using a Pitman-Yor generalization of the Cross Categorization model.Comment: To appear in AISTATS 201

    Particle Swarm Optimization for the Clustering of Wireless Sensors

    Get PDF
    Clustering is necessary for data aggregation, hierarchical routing, optimizing sleep patterns, election of extremal sensors, optimizing coverage and resource allocation, reuse of frequency bands and codes, and conserving energy. Optimal clustering is typically an NP-hard problem. Solutions to NP-hard problems involve searches through vast spaces of possible solutions. Evolutionary algorithms have been applied successfully to a variety of NP-hard problems. We explore one such approach, Particle Swarm Optimization (PSO), an evolutionary programming technique where a \u27swarm\u27 of test solutions, analogous to a natural swarm of bees, ants or termites, is allowed to interact and cooperate to find the best solution to the given problem. We use the PSO approach to cluster sensors in a sensor network. The energy efficiency of our clustering in a data-aggregation type sensor network deployment is tested using a modified LEACH-C code. The PSO technique with a recursive bisection algorithm is tested against random search and simulated annealing; the PSO technique is shown to be robust. We further investigate developing a distributed version of the PSO algorithm for clustering optimally a wireless sensor network

    Mengenal pasti masalah pemahaman dan hubungannya dengan latar belakang matematik, gaya pembelajaran, motivasi dan minat pelajar terhadap bab pengawalan kos makanan di Sekolah Menengah Teknik (ert) Rembau: satu kajian kes.

    Get PDF
    Kajian ini dijalankan untuk mengkaji hubungan korelasi antara latar belakang Matematik, gaya pembelajaran, motivasi dan minat dengan pemahaman pelajar terhadap bab tersebut. Responden adalah seramai 30 orang iaitu terdiri daripada pelajar tingkatan lima kursus Katering, Sekolah Menengah Teknik (ERT) Rembau, Negeri Sembilan. Instrumen kajian adalah soal selidik dan semua data dianalisis menggunakan program SPSS versi 10.0 untuk mendapatkan nilai min dan nilai korelasi bagi memenuhi objektif yang telah ditetapkan. Hasil kajian ini menunjukkan bahawa hubungan korelasi antara gaya pembelajaran pelajar terhadap pemahaman pelajar adalah kuat. Manakala hubungan korelasi antara latar belakang Matematik, motivasi dan minat terhadap pemahaman pelajar adalah sederhana. Nilai tahap min bagi masalah pemahaman pelajar, latar belakang Matematik, gaya pembelajaran, motivasi dan minat terhadap bab Pengawalan Kos Makanan adalah sederhana. Kajian ini mencadangkan penghasilan satu Modul Pembelajaran Kendiri bagi bab Pengawalan Kos Makanan untuk membantu pelajar kursus Katering dalam proses pembelajaran mereka

    Generating Robust and Efficient Networks Under Targeted Attacks

    Full text link
    Much of our commerce and traveling depend on the efficient operation of large scale networks. Some of those, such as electric power grids, transportation systems, communication networks, and others, must maintain their efficiency even after several failures, or malicious attacks. We outline a procedure that modifies any given network to enhance its robustness, defined as the size of its largest connected component after a succession of attacks, whilst keeping a high efficiency, described in terms of the shortest paths among nodes. We also show that this generated set of networks is very similar to networks optimized for robustness in several aspects such as high assortativity and the presence of an onion-like structure

    Community detection algorithms: a comparative analysis

    Full text link
    Uncovering the community structure exhibited by real networks is a crucial step towards an understanding of complex systems that goes beyond the local organization of their constituents. Many algorithms have been proposed so far, but none of them has been subjected to strict tests to evaluate their performance. Most of the sporadic tests performed so far involved small networks with known community structure and/or artificial graphs with a simplified structure, which is very uncommon in real systems. Here we test several methods against a recently introduced class of benchmark graphs, with heterogeneous distributions of degree and community size. The methods are also tested against the benchmark by Girvan and Newman and on random graphs. As a result of our analysis, three recent algorithms introduced by Rosvall and Bergstrom, Blondel et al. and Ronhovde and Nussinov, respectively, have an excellent performance, with the additional advantage of low computational complexity, which enables one to analyze large systems.Comment: 12 pages, 8 figures. The software to compute the values of our general normalized mutual information is available at http://santo.fortunato.googlepages.com/inthepress
    corecore