Previously held under moratorium from 1st December 2016 until 1st December 2021Amino Acid Residues are often the focus of research on protein structures. However, in a folded protein, each residue finds itself in an environment that is defined
by the properties of its surrounding residues. The term microenvironment is used
herein to refer to these local ensembles. Not only do they have chemical properties but also topological properties which quantify concepts such as density,
boundaries between domains and junction complexity. These quantifications are
used to project a protein’s backbone structure into a series of scores.
The hypothesis was that these sequences of scores can be used to discover protein
domains and motifs and that they can be used to align and compare groups of
3D protein structures.
This research sought to implement a system that could efficiently compute microenvironments such that they can be applied routinely to large datasets. The
computation of the microenvironments was the most challenging aspect in terms
of performance, and the optimisations required are described.
Methods of scoring microenvironments were developed to enable the extraction
of domain and motif data without 3D alignment. The problem of allosteric site
detection was addressed with a classifier that gave high rates of allosteric site
detection.
Overall, this work describes the development of a system that scales well with
increasing dataset sizes. It builds on existing techniques, in order to automatically detect the boundaries of domains and demonstrates the ability to process
large datasets by application to allosteric site detection, a problem that has not
previously been adequately solved.Amino Acid Residues are often the focus of research on protein structures. However, in a folded protein, each residue finds itself in an environment that is defined
by the properties of its surrounding residues. The term microenvironment is used
herein to refer to these local ensembles. Not only do they have chemical properties but also topological properties which quantify concepts such as density,
boundaries between domains and junction complexity. These quantifications are
used to project a protein’s backbone structure into a series of scores.
The hypothesis was that these sequences of scores can be used to discover protein
domains and motifs and that they can be used to align and compare groups of
3D protein structures.
This research sought to implement a system that could efficiently compute microenvironments such that they can be applied routinely to large datasets. The
computation of the microenvironments was the most challenging aspect in terms
of performance, and the optimisations required are described.
Methods of scoring microenvironments were developed to enable the extraction
of domain and motif data without 3D alignment. The problem of allosteric site
detection was addressed with a classifier that gave high rates of allosteric site
detection.
Overall, this work describes the development of a system that scales well with
increasing dataset sizes. It builds on existing techniques, in order to automatically detect the boundaries of domains and demonstrates the ability to process
large datasets by application to allosteric site detection, a problem that has not
previously been adequately solved