The twenty protein coding amino acids are found in proteomes with different
relative abundances. The most abundant amino acid, leucine, is nearly an order
of magnitude more prevalent than the least abundant amino acid, cysteine. Amino
acid metabolic costs differ similarly, constraining their incorporation into
proteins. On the other hand, sequence diversity is necessary for protein
folding, function and evolution. Here we present a simple model for a
cost-diversity trade-off postulating that natural proteomes minimize amino acid
metabolic flux while maximizing sequence entropy. The model explains the
relative abundances of amino acids across a diverse set of proteomes. We found
that the data is remarkably well explained when the cost function accounts for
amino acid chemical decay. More than one hundred proteomes reach comparable
solutions to the trade-off by different combinations of cost and diversity.
Quantifying the interplay between proteome size and entropy shows that
proteomes can get optimally large and diverse