3 research outputs found

    Exploring Protein Interactions Through the Development of in silico Methodologies and Genetic Analysis

    Get PDF
    Significant advances in sequencing technologies over the past twenty years have drastically changed scientific approaches to solving biological problems. A shift towards big data driven research and computational predictions has been made possible by vast sequence and structural databases, populated by sequences determined by modern sequencing methods. This thesis focuses on the use of these resources to predict protein function in the form of ligand-binding sites, in the determination of positions in proteins that are pivotal in virus pathogenesis, and in the elucidation of amino acids in myosin proteins which are adapted to the mass of mammalian species. Firstly, a tool developed to predict protein-ligand interactions is introduced that utilises up-to-date data resources, tools, and machine learning algorithms to identify, and assign confidence to, candidate ligand-binding positions in proteins. This work builds on 3DLigandSite in the first major update since its release. A new webserver is presented in this work for users to query their protein sequences for probable ligand-binding residues. Sequence analysis of conventional myosin sequences identified positions associated with increased body mass and no association with clade in the second area explored in this thesis. In beta-cardiac myosin, nine positions were found to adjust contraction velocity when mutated, highlighting the predictive power from alignments combined with a measurable phenotype. This further shows the time- and cost-saving benefits of combined computational and experimental methods. Finally, differentially conserved positions were identified between SARS-CoV and SARS-CoV-2, determining probable variants for the source of phenotypic differences between these viruses. A webserver for other researchers to perform these analyses was developed, enabling users to query coronavirus sequences for these variants. Our analysis included all sequences available from GISAID at the time of writing. Some of the DCPs identified using these methods are shown to be found in the ACE2 binding site of the spike protein, and in close vicinity to serine protease binding sites, which are essential for virus entry into the host
    corecore