Although widely used in practice, the behavior and accuracy of the popular
module identification technique called modularity maximization is not well
understood in practical contexts. Here, we present a broad characterization of
its performance in such situations. First, we revisit and clarify the
resolution limit phenomenon for modularity maximization. Second, we show that
the modularity function Q exhibits extreme degeneracies: it typically admits an
exponential number of distinct high-scoring solutions and typically lacks a
clear global maximum. Third, we derive the limiting behavior of the maximum
modularity Q_max for one model of infinitely modular networks, showing that it
depends strongly both on the size of the network and on the number of modules
it contains. Finally, using three real-world metabolic networks as examples, we
show that the degenerate solutions can fundamentally disagree on many, but not
all, partition properties such as the composition of the largest modules and
the distribution of module sizes. These results imply that the output of any
modularity maximization procedure should be interpreted cautiously in
scientific contexts. They also explain why many heuristics are often successful
at finding high-scoring partitions in practice and why different heuristics can
disagree on the modular structure of the same network. We conclude by
discussing avenues for mitigating some of these behaviors, such as combining
information from many degenerate solutions or using generative models.Comment: 20 pages, 14 figures, 6 appendices; code available at
http://www.santafe.edu/~aaronc/modularity