thesis

Two-component regulation: modelling, predicting & identifying protein-protein interactions & assessing signalling networks of bacteria

Abstract

Two-component signalling systems (TCSs) are found in most prokaryotic genomes. They typically comprise of two proteins, a histidine (or sensor) kinase (HK) and an associated response regulator (RR), containing transmitter and receiver domains respectively, which interact to achieve transfer of a phosphoryl group from a histidine residue (of the transmitter domain in the HK) to an aspartate residue (of the partner RR’s receiver domain). An automated analysis pipeline using the NCBI’s RPS-BLAST tool was developed to identify and classify all TCS genes from completed prokaryotic genomes using the PFAM and CDD protein domain databases. A large proportion of TCS genes were found to be simple hybrid kinases (HYs) containing both a transmitter domain and a receiver domain within a single protein, presumably the result of the fusion or combination of separate HK and RR genes. This propensity to consolidate functionality into a single protein was found to be limited in the presence of either a transmembrane sensory/input domain or a DNA binding domain – two spatially separated functions. While HK and RR genes are usually found together in the genome, in some species a large proportion of TCS domains are found as part of complex hybrid kinases (genes containing multiple TCS domains), in isolated or orphaned genes, or in complex gene clusters. In such organisms the lack of paired HK and RR genes makes it difficult to define genome-encoded signalling networks. Identifying paired transmitter and receiver domains from a pan-genomic survey of prokaryotes gives a database of amino acid sequences for thousands of interacting protein-protein complexes. Covariation between columns of multiple sequence alignments (MSAs) identifies particular pairs of residues representing interactions within the docked complex. Using numerical scores, these amino acids pairs were successfully used as explanatory variables in a generalised linear model (GLM) to predict the probabilities of interaction between transmitter and receiver domains

    Similar works