1 research outputs found
Bridging the Gaps in Statistical Models of Protein Alignment
This work demonstrates how a complete statistical model quantifying the
evolution of pairs of aligned proteins can be constructed from a
time-parameterised substitution matrix and a time-parameterised 3-state
alignment machine. All parameters of such a model can be inferred from any
benchmark data-set of aligned protein sequences. This allows us to examine nine
well-known substitution matrices on six benchmarks curated using various
structural alignment methods; any matrix that does not explicitly model a
"time"-dependent Markov process is converted to a corresponding base-matrix
that does. In addition, a new optimal matrix is inferred for each of the six
benchmarks. Using Minimum Message Length (MML) inference, all 15 matrices are
compared in terms of measuring the Shannon information content of each
benchmark. This has resulted in a new and clear overall best performed
time-dependent Markov matrix, MMLSUM, and its associated 3-state machine, whose
properties we have analysed in this work. For standard use, the MMLSUM series
of (log-odds) \textit{scoring} matrices derived from the above Markov matrix,
are available at https://lcb.infotech.monash.edu.au/mmlsum.Comment: Main text: 15 pages, 4 Figs Supplementary text: 12 pages, 6 Fig