The use of phylogenetic reconstruction as a predictive tool to functionally identify raffinose family oligosaccharide (RFO) producing glycosyltransferases

Abstract

Thesis (MScAgric)--Stellenbosch University, 2022.ENGLISH ABSTRACT: Carbohydrate active enzymes (CAZymes) are numerous and diverse enzymes that are involved with the transport, synthesis, and catalysis of carbohydrates. All known and predicted CAZymes are housed on the CAZy database (www.cazy.org). Two classes of CAZymes, the glycosyltransferases (GTs) and glycosyl hydrolases (GHs) are important classes in the biosynthesis of a group of galacto- oligosaccharides termed the raffinose family of oligosaccharides (RFOs). The RFOs are the most widespread D-galactose (Gal) containing oligosaccharides in higher plants where they present a number of vital natural functions including carbon transport and storage and amelioration of both abiotic and biotic stresses. Recently, they have also emerged as powerful prebiotic agents, as they provided usable carbon stimulating the growth of health beneficial gut microbes. Their biosynthesis occurs through a distinct series of enzymatic reactions that begin with the biosynthesis of galactinol (Gol) catalysed by the action of a galactinol synthase (GolS, GT8, EC 2.4.1.67). It is Gol that serves as the galactosyl donor toward the biosynthesis of raffinose (Raf) and stachyose (Sta). These reactions are catalysed by the GHs raffinose synthase (RafS, GH36, EC 2.4.1.82) and stachyose synthase (StaS, GH36, EC 2.4.1.67), respectively. Numerous entries into genome databases and the CAZy repository, which lack functional biochemical description are only putatively annotated according to sequence similarities to orthologous gene sequences. Here, the use of orthologous genes to putatively annotate proteins, specifically RFO synthesising enzymes, has led to inaccuracies in database records with regards to the functional enzyme annotations – with many RFO related CAZymes putatively annotated as being similar to GTs (involved in synthesis) and GHs (involved in hydrolysis). Consequently, functional characterisations of RafSs and StaSs are historically underrepresented in literature as they are difficult to identify – despite the extensive genome resource databases available for numerous plants models. The emerging repurposing of phylogenetic reconstructions has shown increased accuracy when annotating putative enzymes. Online resources such as SIFTER and PhyloGenes (https://sifter.berkeley.edu/, http://www.phylogenes.org/) have the ability to use phylogenetic trees as a means to accurately identify groupings of proteins which share functional identities. In this study, we sought to use a phylogenetic reconstruction as a predictive tool toward function, to identify RFO biosynthetic genes (RafS and StaS) from publicly available genome resource databases where their functional annotations are either putative or unclear. We focused largely to the newly established legume genome databases, using the known orthologues from Arabidopsis RafS (AtRS5, At5G40390) and StaS (AtRS4, At4G01970) in BLASTn and BLASTp searches, to identify candidate genes. We subsequently focused to key signatures in the amino acid sequences of the candidate genes, including a hallmark 80 amino acid signature which represents a potential functional discriminator between RafS and StaS proteins to carefully curate the candidate genes. We then generated Maximum Likelihood and Bayesian Inference trees, rooting them against Arabidopsis ATSIP2 (At3G56590), a known Raf hydrolysing alkaline α-galactosidase (α-Gal, EC 3.2.1.22.). Based on the outcomes of the trees, we selected two legume RafS candidates from barrel medic (Medicago truncatula) and chickpea (Cicer arietinum). The coding sequences of these genes were isolated, cloned into a bacterial expression vector and heterologously expressed in E. coli. Using crude protein extracts, we then sought to determine if they demonstrated the ability to produce Raf, when incubated in vitro in the presence of sucrose and galactinol. Using quantitative tandem mass spectrometry (LC-MS/MS), we were not able to identify a distinct Raf producing capacity for either gene candidate, nor was a recombinant protein produced when using the bacterial expression vector pSF-OXB20 (constitutive promoter). However, the candidate RafS gene from M. truncatula was then cloned into the pDEST17™ bacterial expression vector (arabinose inducible promoter) and we could then identify Raf synthesis capacity in crude protein extracts. This provided some evidence toward the validity of our phylogenetic reconstruction as this RafS gene candidate has an unclear functional annotation in the genome resource databases for M. truncatula.AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar.Master

    Similar works