Various approaches have explored the covariation of residues in
multiple-sequence alignments of homologous proteins to extract functional and
structural information. Among those are principal component analysis (PCA),
which identifies the most correlated groups of residues, and direct coupling
analysis (DCA), a global inference method based on the maximum entropy
principle, which aims at predicting residue-residue contacts. In this paper,
inspired by the statistical physics of disordered systems, we introduce the
Hopfield-Potts model to naturally interpolate between these two approaches. The
Hopfield-Potts model allows us to identify relevant 'patterns' of residues from
the knowledge of the eigenmodes and eigenvalues of the residue-residue
correlation matrix. We show how the computation of such statistical patterns
makes it possible to accurately predict residue-residue contacts with a much
smaller number of parameters than DCA. This dimensional reduction allows us to
avoid overfitting and to extract contact information from multiple-sequence
alignments of reduced size. In addition, we show that low-eigenvalue
correlation modes, discarded by PCA, are important to recover structural
information: the corresponding patterns are highly localized, that is, they are
concentrated in few sites, which we find to be in close contact in the
three-dimensional protein fold.Comment: Supporting information can be downloaded from:
http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.100317