We show that cluster expansions (CE), previously used to model solid-state
materials with binary or ternary configurational disorder, can be extended to
the protein design problem. We present a generalized CE framework, in which
properties such as energy can be unambiguously expanded in the amino-acid
sequence space. The CE coarse grains over nonsequence degrees of freedom (e.g.,
side-chain conformations) and thereby simplifies the problem of designing
proteins, or predicting the compatibility of a sequence with a given structure,
by many orders of magnitude. The CE is physically transparent, and can be
evaluated through linear regression on the energies of training sequences. We
show, as example, that good prediction accuracy is obtained with up to pairwise
interactions for a coiled-coil backbone, and that triplet interactions are
important in the energetics of a more globular zinc-finger backbone.Comment: 10 pages, 3 figure