Predicting protein-membrane interfaces of peripheral membrane proteins using ensemble machine learning.

Abstract

Abnormal protein-membrane attachment is involved in deregulated cellular pathways and in disease. Therefore, the possibility to modulate protein-membrane interactions represents a new promising therapeutic strategy for peripheral membrane proteins that have been considered so far undruggable. A major obstacle in this drug design strategy is that the membrane-binding domains of peripheral membrane proteins are usually unknown. The development of fast and efficient algorithms predicting the protein-membrane interface would shed light into the accessibility of membrane-protein interfaces by drug-like molecules. Herein, we describe an ensemble machine learning methodology and algorithm for predicting membrane-penetrating amino acids. We utilize available experimental data from the literature for training 21 machine learning classifiers and meta-classifiers. Evaluation of the best ensemble classifier model accuracy yields a macro-averaged F1 score = 0.92 and a Matthews correlation coefficient = 0.84 for predicting correctly membrane-penetrating amino acids on unknown proteins of a validation set. The python code for predicting protein-membrane interfaces of peripheral membrane proteins is available at https://github.com/zoecournia/DREAMM

    Similar works