Proteins are essential to life across all organisms. They act as enzymes, antibodies, transporters
of molecules, structural elements, among other important roles. Their ability to
interact with specific molecules in a selective manner, is what makes them important.
Being able to understand their interaction can provide many advantages in fields such
as drug design and metabolic engineering. Current methods of predicting protein interaction
attempt to geometrically fit the structures of two proteins together by generating
a large amount of potential configurations and then discriminating the correct pose from
the remaining ones.
Given the large search space, approaches to reduce the complexity are often employed.
Identifying a contact point between the pairing proteins is a good constraining factor. If
at least one contact can be predicted among a small set of possibilities (e.g. 100), the
search space will be significantly reduced.
Using structural and evolutionary information of the interacting proteins, a machine
learning predictor can be developed for this task. Such evolutionary measures are computed
over a substantial amount of homologous sequences, which can be filtered and
ordered in many different ways. As a result, a machine learning solution was developed
that focused in measuring the effects that differing homolog arrangements can have over
the final prediction