1,871 research outputs found
A General Optimization Technique for High Quality Community Detection in Complex Networks
Recent years have witnessed the development of a large body of algorithms for
community detection in complex networks. Most of them are based upon the
optimization of objective functions, among which modularity is the most common,
though a number of alternatives have been suggested in the scientific
literature. We present here an effective general search strategy for the
optimization of various objective functions for community detection purposes.
When applied to modularity, on both real-world and synthetic networks, our
search strategy substantially outperforms the best existing algorithms in terms
of final scores of the objective function; for description length, its
performance is on par with the original Infomap algorithm. The execution time
of our algorithm is on par with non-greedy alternatives present in literature,
and networks of up to 10,000 nodes can be analyzed in time spans ranging from
minutes to a few hours on average workstations, making our approach readily
applicable to tasks which require the quality of partitioning to be as high as
possible, and are not limited by strict time constraints. Finally, based on the
most effective of the available optimization techniques, we compare the
performance of modularity and code length as objective functions, in terms of
the quality of the partitions one can achieve by optimizing them. To this end,
we evaluated the ability of each objective function to reconstruct the
underlying structure of a large set of synthetic and real-world networks.Comment: MAIN text: 14 pages, 4 figures, 1 table Supplementary information: 19
pages, 8 figures, 5 table
Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization
We study the problem of detecting a structured, low-rank signal matrix
corrupted with additive Gaussian noise. This includes clustering in a Gaussian
mixture model, sparse PCA, and submatrix localization. Each of these problems
is conjectured to exhibit a sharp information-theoretic threshold, below which
the signal is too weak for any algorithm to detect. We derive upper and lower
bounds on these thresholds by applying the first and second moment methods to
the likelihood ratio between these "planted models" and null models where the
signal matrix is zero. Our bounds differ by at most a factor of root two when
the rank is large (in the clustering and submatrix localization problems, when
the number of clusters or blocks is large) or the signal matrix is very sparse.
Moreover, our upper bounds show that for each of these problems there is a
significant regime where reliable detection is information- theoretically
possible but where known algorithms such as PCA fail completely, since the
spectrum of the observed matrix is uninformative. This regime is analogous to
the conjectured 'hard but detectable' regime for community detection in sparse
graphs.Comment: For sparse PCA and submatrix localization, we determine the
information-theoretic threshold exactly in the limit where the number of
blocks is large or the signal matrix is very sparse based on a conditional
second moment method, closing the factor of root two gap in the first versio
- …