7 research outputs found
Simplified amino acid alphabets based on deviation of conditional probability from random background
The primitive data for deducing the Miyazawa-Jernigan contact energy or
BLOSUM score matrix consists of pair frequency counts. Each amino acid
corresponds to a conditional probability distribution. Based on the deviation
of such conditional probability from random background, a scheme for reduction
of amino acid alphabet is proposed. It is observed that evident discrepancy
exists between reduced alphabets obtained from raw data of the
Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous
sequence database SCOP40 as a test set, we detect homology with the obtained
coarse-grained substitution matrices. It is verified that the reduced alphabets
obtained well preserve information contained in the original 20-letter
alphabet.Comment: 9 pages,3figure