The similarity in the three-dimensional structures of homologous proteins
imposes strong constraints on their sequence variability. It has long been
suggested that the resulting correlations among amino acid compositions at
different sequence positions can be exploited to infer spatial contacts within
the tertiary protein structure. Crucial to this inference is the ability to
disentangle direct and indirect correlations, as accomplished by the recently
introduced Direct Coupling Analysis (DCA) (Weigt et al. (2009) Proc Natl Acad
Sci 106:67). Here we develop a computationally efficient implementation of DCA,
which allows us to evaluate the accuracy of contact prediction by DCA for a
large number of protein domains, based purely on sequence information. DCA is
shown to yield a large number of correctly predicted contacts, recapitulating
the global structure of the contact map for the majority of the protein domains
examined. Furthermore, our analysis captures clear signals beyond intra- domain
residue contacts, arising, e.g., from alternative protein conformations,
ligand- mediated residue couplings, and inter-domain interactions in protein
oligomers. Our findings suggest that contacts predicted by DCA can be used as a
reliable guide to facilitate computational predictions of alternative protein
conformations, protein complex formation, and even the de novo prediction of
protein domain structures, provided the existence of a large number of
homologous sequences which are being rapidly made available due to advances in
genome sequencing.Comment: 28 pages, 7 figures, to appear in PNA