The evolutionary reason for the increase in gene length from archaea to
prokaryotes to eukaryotes observed in large scale genome sequencing efforts has
been unclear. We propose here that the increasing complexity of protein-protein
interactions has driven the selection of longer proteins, as longer proteins
are more able to distinguish among a larger number of distinct interactions due
to their greater average surface area. Annotated protein sequences available
from the SWISS-PROT database were analyzed for thirteen eukaryotes, eight
bacteria, and two archaea species. The number of subcellular locations to which
each protein is associated is used as a measure of the number of interactions
to which a protein participates. Two databases of yeast protein-protein
interactions were used as another measure of the number of interactions to
which each \emph{S. cerevisiae} protein participates. Protein length is shown
to correlate with both number of subcellular locations to which a protein is
associated and number of interactions as measured by yeast two-hybrid
experiments. Protein length is also shown to correlate with the probability
that the protein is encoded by an essential gene. Interestingly, average
protein length and number of subcellular locations are not significantly
different between all human proteins and protein targets of known, marketed
drugs. Increased protein length appears to be a significant mechanism by which
the increasing complexity of protein-protein interaction networks is
accommodated within the natural evolution of species. Consideration of protein
length may be a valuable tool in drug design, one that predicts different
strategies for inhibiting interactions in aberrant and normal pathways.Comment: 13 pages, 5 figures, 2 tables, to appear in Physica