Protein microenvironments for topology analysis

Abstract

Previously held under moratorium from 1st December 2016 until 1st December 2021Amino Acid Residues are often the focus of research on protein structures. However, in a folded protein, each residue finds itself in an environment that is defined by the properties of its surrounding residues. The term microenvironment is used herein to refer to these local ensembles. Not only do they have chemical properties but also topological properties which quantify concepts such as density, boundaries between domains and junction complexity. These quantifications are used to project a protein’s backbone structure into a series of scores. The hypothesis was that these sequences of scores can be used to discover protein domains and motifs and that they can be used to align and compare groups of 3D protein structures. This research sought to implement a system that could efficiently compute microenvironments such that they can be applied routinely to large datasets. The computation of the microenvironments was the most challenging aspect in terms of performance, and the optimisations required are described. Methods of scoring microenvironments were developed to enable the extraction of domain and motif data without 3D alignment. The problem of allosteric site detection was addressed with a classifier that gave high rates of allosteric site detection. Overall, this work describes the development of a system that scales well with increasing dataset sizes. It builds on existing techniques, in order to automatically detect the boundaries of domains and demonstrates the ability to process large datasets by application to allosteric site detection, a problem that has not previously been adequately solved.Amino Acid Residues are often the focus of research on protein structures. However, in a folded protein, each residue finds itself in an environment that is defined by the properties of its surrounding residues. The term microenvironment is used herein to refer to these local ensembles. Not only do they have chemical properties but also topological properties which quantify concepts such as density, boundaries between domains and junction complexity. These quantifications are used to project a protein’s backbone structure into a series of scores. The hypothesis was that these sequences of scores can be used to discover protein domains and motifs and that they can be used to align and compare groups of 3D protein structures. This research sought to implement a system that could efficiently compute microenvironments such that they can be applied routinely to large datasets. The computation of the microenvironments was the most challenging aspect in terms of performance, and the optimisations required are described. Methods of scoring microenvironments were developed to enable the extraction of domain and motif data without 3D alignment. The problem of allosteric site detection was addressed with a classifier that gave high rates of allosteric site detection. Overall, this work describes the development of a system that scales well with increasing dataset sizes. It builds on existing techniques, in order to automatically detect the boundaries of domains and demonstrates the ability to process large datasets by application to allosteric site detection, a problem that has not previously been adequately solved

    Similar works