This paper addresses the statistical significance of structures in random
data: Given a set of vectors and a measure of mutual similarity, how likely
does a subset of these vectors form a cluster with enhanced similarity among
its elements? The computation of this cluster p-value for randomly distributed
vectors is mapped onto a well-defined problem of statistical mechanics. We
solve this problem analytically, establishing a connection between the physics
of quenched disorder and multiple testing statistics in clustering and related
problems. In an application to gene expression data, we find a remarkable link
between the statistical significance of a cluster and the functional
relationships between its genes.Comment: to appear in Phys. Rev. Let