Background: Harnessing vast amounts of genomic data in phylogenetic context stemming from massive sequencing of multiple closely related genomes requires new tools and approaches. We present a tool for the genome-wide analysis of frequencies and patterns of amino acid substitutions in multiple alignments of genes’ coding regions, and a database of amino acid substitutions in the phylogeny of 12 Drosophila genomes. We illustrate the use of these resources to address three types of evolutionary genomics questions: about fluxes in amino acid composition in proteins, about asymmetries in amino acid substitutions and about patterns of molecular evolution in duplicated genes. Results: We demonstrate that amino acid composition of Drosophila proteins underwent a significant shift over the last 70 million years encompassed by the studied phylogeny, with less common amino acids (Cys, Met, His) increasing in frequency and more common ones (Ala, Leu, Glu) becoming less frequent. These fluxes are strongly correlated with polarity of source and destination amino acids, resulting in overall systematic decrease of mean polarity of amino acids found in Drosophila proteins. Frequency and radicality of amino acid substitutions are higher in paralogs than in orthologous single-copy genes and are higher in gene families with paralogs than in gene families without surviving duplications. Rate and radicality of substitutions, as expected, are negativel
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.