429 research outputs found

    Experimental and computational exploration of enzyme sequence space

    Get PDF
    Millions of enzymes with desirable features or new exciting activities can be found in organisms occupying diverse niches all around the earth. However, enzyme studies tend to be biased towards characterisation of representatives from eukaryotes, model organisms, or disease-causing bacteria. As such, a large number of enzymes still remains underexplored. The so-called sequence space of proteins - all possible protein sequences - is even greater when we include not only natural sequences, but also the ones designed by human or artificial intelligence. This thesis explores various reasons, approaches, and outcomes of investigation of large enzymatic sequence spaces.\ua0In the first part of my work, I focused on investigation of a natural sequence space of oxidases using a high-throughput activity profiling platform. A functional screen of an industrially important class of enzymes, S-2-hydroxyacid oxidases (EC 1.1.3.15), revealed that nearly 80% of the class is misannotated. Further exploration of annotations to public databases indicated that similar errors of annotations can be found in other enzyme classes. A broader activity profiling of 1.1.3.x oxidases resulted in the discovery of two novel microbial enzymes: N-acetyl-hexosamine oxidase, and a novel type of long-chain alcohol oxidase.\ua0Natural enzymes often need to be improved in order to be industrially applied, for example to become more stable, or accept non-natural substrates. A novel, and constantly developing, approach for enzyme design involves the use of machine learning (ML) tools. Second part of my work focused on screening an enzyme sequence space designed by generative adversarial networks. Our work proved that ML methods can generate fully functional enzymes that mimic sequences present in nature.Enzyme assays are necessary to get a full understanding of how enzymes work. Traditional kinetic assays are time- and reagent-consuming and as a result a limited number of variants and conditions are being tested for each target. In my final work I described a novel approach for enzyme kinetic studies, by adaptation of a microfluidic qPCR device

    Investigation of Narrow Spectrum Targets in Antibacterial Drug Discovery

    Get PDF
    Background: Significant concerns are associated with the use of broad-spectrum antibacterial agents, including collateral eradication of beneficial bacteria from the human microbiome, the onset of antibacterial-associated infections, and continued emergence of antibacterial drug resistance. As such, a critical need for novel and selective antibacterial targets exists. The investigation of two such targets, each pertaining to the highly concerning infections caused by streptococcal species and Clostridioides difficile, are presented herein. Bacterial topoisomerase I represents a potentially promising narrow-spectrum target as studies have arisen demonstrating its essentiality in bacterial species lacking the only other type IA topoisomerase (topoisomerase III). Additionally, recent studies demonstrating the essentiality of the fabK gene expressing enoyl-ACP reductase II (FabK) in C. difficile indicate its significant potential as a narrow-spectrum target. Presented here are data characterizing and validating both the TopoI and FabK enzymes as novel antibacterial targets via the implementation of an array of drug discovery techniques, including structural studies, biochemical assay development and application, and inhibitor screening and testing. Methods: An assortment of drug discovery techniques were employed for the targeting of SmTopoI and CdFabK, including different protein expression and purification techniques; X-ray crystallography; various biophysical and biochemical techniques for target characterization, validation, and drug screening; and different lead development and optimization studies. Results: The respective genes for SmTopoI and CdFabK have been cloned, and the expression and purification of various constructs of each target have been carried out and optimized for further analysis. The crystal structure of SmTopoI_N65 has been determined to 2.06 Å and diffracting CdFabK crystals (3.5 Å) have been attained. A high-throughput plate-based biochemical fluorescence kinetic assay has been optimized for screening against the CdFabK enzyme. Furthermore, activity and modality of inhibition assessment of small-molecule inhibitors of the CdFabK enzyme have been conducted, including phenylimidazole and benzothiazole compounds. Phenylimidazole analogues have been found to display micromolar inhibitory activity against CdFabK, and a benzothiazole analogue has been found to display nanomolar inhibitory activity against the target. Conclusions: The SmTopoI and CdFabK enzymes present potentially novel, narrow- spectrum antibacterial drug targets, and substantial progress has been made toward the rational targeting of these two enzymes. Of particular note, the first structure of a Topo I fragment from a gram-positive organism, S. mutans, has been determined. Enzymology and inhibitor studies have been conducted supporting the druggability of CdFabK and indicating the potential for selective inhibition of CdFabK

    Binding determinants for substrates and inhibitors of trehalose-6-phosphate phosphatase

    Full text link
    Trehalose is a sugar commonly found in archaeon, bacteria, fungi, plants, and invertebrates. It is utilized as an energy source and upregulated during stress conditions such as thermal fluctuations and oxidative stress. As mammals do not synthesize trehalose, trehalose biosynthetic pathways have become therapeutic targets for infectious diseases. The enzyme trehalose-6-phosphate phosphatase (T6PP) catalyzes the dephosphorylation of trehalose 6-phosphate to form trehalose. In its absence, the viability and virulence of bacteria, fungi, plants and nematodes are decreased. Hence T6PP is the focus of this study as a target for therapeutics of the diseases tuberculosis and lymphatic filariasis. T6PP is a phosphohydrolase in the haloalkanoic acid dehalogenase superfamily. To identify the determinants for substrate specificity needed to guide structure-aided inhibitor design for therapeutics, atomic-resolution crystallographic information on the Michaelis complex is of great importance. Toward this goal, the structure of T6PP from Mycobacterium marinum was determined via X-ray crystallography in an unliganded form and the structure of T6PP from Salmonella typhimurium (St) was determined in the apo form, bound to the substrate analog, trehalose 6-phosphate, the product, trehalose, and the inhibitor, 4-n-octylphenyl α-D-glucopyranoside 6-sulfate. The enzyme confers specificity via hydrogen bonding to the phosphate and glucosyl group proximal to the phosphate. Specifically, the conserved residues Glu123, Lys125 and Glu167 form hydrogen bonds to the hydroxyl groups of the proximal glucose. However, the distal glucose binding sub-site can tolerate new chemotypes. To further aid inhibitor design, the two inhibitors of Brugia malayi T6PP discovered via screening the Johns Hopkins library of FDA-approved drugs, Cephalosporin C and Closantel, were computationally docked into StT6PP. The Cephalosporin C scaffold was optimized to provide an inhibitor with a KI of 20 M that comprises a 5,6-indole scaffold to afford hydrogen bonds to the Glu/Lys/Glu motif and a computationally discovered phosphate mimic tetrazole. Closantel acts as a slow-binding inhibitor and a series of analogs were synthesized to increase potency. Two analogs show enhanced efficacy relative to Closantel with IC50 values near 60 M. Future efforts will aim to optimize these scaffolds for inhibition of T6PP to develop therapeutics for tuberculosis and lymphatic filariasis

    Structural Characterization of Lysogeny-promoting Transcription Factors of Bacteriophage 186

    Get PDF
    Bacteriophage 186, like bacteriophage λ, is a UV-inducible temperate bacteriophage of Escherichia coli. At the DNA sequence level, these two phages show little similarity. The arrangements of promoters, genes and regulatory elements in the genome of the two phages are different, yet show similarity in the regulatory network topology. In fact, proteins with equivalent functions can be pinpointed and previous studies have shown that these proteins act as key effectors in the lytic-lysogenic decision. Studies described in this thesis were aimed at investigating these bacteriophage 186 proteins to understand their structure-function relationship. These proteins regulate transcriptional behavior and may be useful as parts for the construction of synthetic circuits. In 186, the CII protein (analogous to λ CII) is responsible for establishment of lysogeny. It is a transcriptional activator that upon binding to a pair of half sites at the 186 pE promoter, activates it to generate the initial levels of immunity repressor to allow the phage to enter lysogeny. Its unusual property of binding to half sites two DNA turns apart and its potent ability to activate pE over 400-fold, suggests it may possess a novel mechanism of promoter activation. The X-ray crystal structure of the 186 CII protein was solved. It revealed the protein adopts a tetrameric quaternary structure, consisting of rigid dimeric subunits, which further dimerize to form a tetramer. Through mass-spectrometry measurements and mutagenesis studies, it was demonstrated that this tetrameric state is necessary for its function. Molecular modelling with the crystal structure provides insight into how CII may bind DNA and interact with the host RNA polymerase to activate its promoter. The role of CII is to generate initial pools of 186 CI (analogous to λ CI). 186 CI is responsible for repressing the lytic functions of 186 to maintain lysogeny. Previous studies have shown that CI dimers multimerize into a higher order structure that interacts with DNA. High resolution structures of a CI dimer and higher order CI CTD oligomer, but no high-resolution structure of the full complex, are available. We used a combination of X-ray crystallography, small angle X-ray scattering and mass-spectrometry to demonstrate that 186 CI forms a wheel-like dodecameric structure that could act as a scaffold for DNA wrapping, giving insight into the mechanism of transcriptional regulation provided by 186 CI. Following the structural characterization of 186 CI, a temperature sensitive mutant of 186 CI was characterized to show that it can be used for temperature-based induction of gene expression. In addition, a temperature-sensitive chimeric repressor containing domains from 186 CI and λ CI was developed. This success illustrates the ability of bacteriophage regulatory proteins to be used as a source of biological parts. X-ray crystallography was extensively used in this PhD project to structurally characterize proteins. When solving novel protein structures, the generation of diffraction quality crystals and the derivatization of protein crystals with heavy atoms for experimental phasing are two commonly encountered bottlenecks. In the final section of this thesis, a technique to simultaneously optimize crystallization and derivatization of a protein is presented. Crystalline precipitate of a protein was crushed up to form a microseed stock, which was used to promote crystallization by adding it to a sparse matrix crystallization screen of the protein. An iodinated compound I3C was added to the crystallization screen, to allow incorporation of I3C into the crystal during the crystallization process. This technique was employed to solve the crystal structures of Hen-Egg White Lysozyme and a putative lysin domain Orf11 from bacteriophage P68 and should be applicable to many other crystallization targets.Thesis (Ph.D.) -- University of Adelaide, School of Biological Sciences, 202

    Flavonol Glucosylation: A Structural Investigation of the Flavonol Specific 3-O Glucosyltransferase Cp3GT

    Get PDF
    Flavonoid glycosyltransferases (GTs), enzymes integral to plant ecological responses and human pharmacology, necessitate rigorous structural elucidation to decipher their mechanistic function and substrate specificity, particularly given their role in the biotransformation of diverse pharmacological agents and natural products. This investigation delved into a comprehensive exploration of the flavonol 3-O GT from Citrus paradisi (Cp3GT), scrutinizing the impact of a c-terminal c-myc/6x histidine tag on its enzymatic activity and substrate specificity, and successfully achieving its purification to apparent homogeneity. This established a strong foundation for potential future crystallographic and other structure/function analyses. Through the strategic implementation of site-directed mutagenesis, a thrombin cleavage site was incorporated proximal to the tag, followed by cloning in Pichia pastoris, methanol-induced expression, and cobalt-affinity chromatography for initial purification stages. Notably, the recombinant tags did not exhibit a discernible influence on Cp3GT kinetics, substrate preference, pH optima, or metal interactions, maintaining its specificity towards flavonols at the 3-OH position and favoring glucosylation of quercetin and kaempferol. Subsequent purification steps, including MonoQ anion exchange and size-exclusion chromatography, yielded Cp3GT with ≥95% homogeneity. In silico molecular models of Cp3GT and its truncated variants, Cp3GTΔ80 and Cp3GTΔ10, were constructed using D-I-TASSER and COFACTOR to assess binding interactions with quercetin and kaempferol. Results indicated minimal interference of c-myc/6x-his tags with the native Cp3GT structure. This study not only lays a foundation for impending crystallographic studies, aiming to solidify the understanding of Cp3GT\u27s stringent 3-O flavonol specificity, but also accentuates the potential of microbial expression platforms and plant metabolic engineering in producing beneficial compounds. To this end, a thorough review of four pivotal classes of plant secondary metabolites, flavonoids, alkaloids, betalains, and glucosinolates, was conducted. This will open avenues for further research and applications in biotechnological, medical, and agricultural domains

    2016 IMSAloquium, Student Investigation Showcase

    Get PDF
    Welcome to the twenty-eighth year of the Student Inquiry and Research Program (SIR)! This is a program that is as old as IMSA. The SIR program represents our unending dedication to enabling our students to learn what it is to be an innovator and to make contributions to what is known on Earth.https://digitalcommons.imsa.edu/archives_sir/1026/thumbnail.jp

    Crystal structure solution of hydrogen bonded systems : a validation and an investigation using historical methodologies followed by a review of crystal structure prediction methodologies to date

    Get PDF
    There are many chemicals that crystallize into more than one form. This phenomenon is called polymorphism. In each form or polymorph, inter and intra-molecular binding differ to varying degrees. As a result of this structural variation, the physical properties of the solid phases may also differ. Even the smallest of changes at the molecular level can result in a significant change in the final adopted crystal structure. Polymorphism in crystal structures allows studies of structure-property relationships since it is only the packing motifs that differ between polymorphs. In this thesis, a ‘computationally assisted’ approach to crystal structure solution was taken. X-ray powder diffraction was used to generate unit cell dimensions and space groups while historical in-house molecular modelling methods were used to generate possible trial structures that would be the starting point for refinement. Finally, a review of the latest methodologies for crystal structure prediction and consideration of polymorphism within the pharmaceutical industry completes this work

    Ancestral sequence reconstruction as an accessible tool for the engineering of biocatalyst stability

    Get PDF
    Synthetic biology is the engineering of life to imbue non-natural functionality. As such, synthetic biology has considerable commercial potential, where synthetic metabolic pathways are utilised to convert low value substrates into high value products. High temperature biocatalysis offers several system-level benefits to synthetic biology, including increased dilution of substrate, increased reaction rates and decreased contamination risk. However, the current gamut of tools available for the engineering of thermostable proteins are either expensive, unreliable, or poorly understood, meaning their adoption into synthetic biology workflows is treacherous. This thesis focuses on the development of an accessible tool for the engineering of protein thermostability, based on the evolutionary biology tool ancestral sequence reconstruction (ASR). ASR allows researchers to walk back in time along the branches of a phylogeny and predict the most likely representation of a protein family’s ancestral state. It also has simple input requirements, and its output proteins are often observed to be thermostable, making ASR tractable to protein engineering. Chapter 2 explores the applicability of multiple ASR methods to the engineering of a carboxylic acid reductase (CAR) biocatalyst. Despite the family emerging only 500 million years ago, ancestors presented considerable improvements in thermostability over their modern counterparts. We proceed to thoroughly characterise the ancestral enzymes for their inclusion into the CAR biocatalytic toolbox. Chapter 3 explores why ASR derived proteins may be thermostable despite a mesophilic history. An in silico toolbox for tracking models of protein stability over simulated evolutionary time at the sequence, protein and population level is built. We provide considerable evidence that the sequence alignments of simulated protein families that evolved at marginal stability are saturated with stabilising residues. ASR therefore derives sequences from a dataset biased toward stabilisation. Importantly, while ASR is accessible, it still requires a steep learning curve based on its requirements of phylogenetic expertise. In chapter 4, we utilise the evolutionary model produced in chapter 3 to develop a highly simplified and accessible ASR protocol. This protocol was then applied to engineer CAR enzymes that displayed dramatic increases in thermostability compared to both modern CARs and the thermostable AncCARs presented in chapter 2

    Information recovery in the biological sciences : protein structure determination by constraint satisfaction, simulation and automated image processing

    Get PDF
    Regardless of the field of study or particular problem, any experimental science always poses the same question: ÒWhat object or phenomena generated the data that we see, given what is known?Ó In the field of 2D electron crystallography, data is collected from a series of two-dimensional images, formed either as a result of diffraction mode imaging or TEM mode real imaging. The resulting dataset is acquired strictly in the Fourier domain as either coupled Amplitudes and Phases (as in TEM mode) or Amplitudes alone (in diffraction mode). In either case, data is received from the microscope in a series of CCD or scanned negatives of images which generally require a significant amount of pre-processing in order to be useful. Traditionally, processing of the large volume of data collected from the microscope was the time limiting factor in protein structure determination by electron microscopy. Data must be initially collected from the microscope either on film-negatives, which in turn must be developed and scanned, or from CCDs of sizes typically no larger than 2096x2096 (though larger models are in operation). In either case, data are finally ready for processing as 8-bit, 16-bit or (in principle) 32-bit grey-scale images. Regardless of data source, the foundation of all crystallographic methods is the presence of a regular Fourier lattice. Two dimensional cryo-electron microscopy of proteins introduces special challenges as multiple crystals may be present in the same image, producing in some cases several independent lattices. Additionally, scanned negatives typically have a rectangular region marking the film number and other details of image acquisition that must be removed prior to processing. If the edges of the images are not down-tapered, vertical and horizontal ÒstreaksÓ will be present in the Fourier transform of the image --arising from the high-resolution discontinuities between the opposite edges of the image. These streaks can overlap with lattice points which fall close to the vertical and horizontal axes and disrupt both the information they contain and the ability to detect them. Lastly, SpotScanning (Downing, 1991) is a commonly used process where-by circular discs are individually scanned in an image. The large-scale regularity of the scanning patter produces a low frequency lattice which can interfere and overlap with any protein crystal lattices. We introduce a series of methods packaged into 2dx (Gipson, et al., 2007) which simultaneously addresses these problems, automatically detecting accurate crystal lattice parameters for a majority of images. Further a template is described for the automation of all subsequent image processing steps on the road to a fully processed dataset. The broader picture of image processing is one of reproducibility. The lattice parameters, for instance, are only one of hundreds of parameters which must be determined or provided and subsequently stored and accessed in a regular way during image processing. Numerous steps, from correct CTF and tilt-geometry determination to the final stages of symmetrization and optimal image recovery must be performed sequentially and repeatedly for hundreds of images. The goal in such a project is then to automatically process as significant a portion of the data as possible and to reduce unnecessary, repetitive data entry by the user. Here also, 2dx (Gipson, et al., 2007), the image processing package designed to automatically process individual 2D TEM images is introduced. This package focuses on reliability, ease of use and automation to produce finished results necessary for full three-dimensional reconstruction of the protein in question. Once individual 2D images have been processed, they contribute to a larger project-wide 3-dimensional dataset. Several challenges exist in processing this dataset, besides simply the organization of results and project-wide parameters. In particular, though tilt-geometry, relative amplitude scaling and absolute orientation are in principle known (or obtainable from an individual image) errors, uncertainties and heterogeneous data-types provide for a 3D-dataset with many parameters to be optimized. 2dx_merge (Gipson, et al., 2007) is the follow-up to the first release of 2dx which had originally processed only individual images. Based on the guiding principles of the earlier release, 2dx_merge focuses on ease of use and automation. The result is a fully qualified 3D structure determination package capable of turning hundreds of electron micrograph images, nearly completely automatically, into a full 3D structure. Most of the processing performed in the 2dx package is based on the excellent suite of programs termed collectively as the MRC package (Crowther, et al., 1996). Extensions to this suite and alternative algorithms continue to play an essential role in image processing as computers become faster and as advancements are made in the mathematics of signal processing. In this capacity, an alternative procedure to generate a 3D structure from processed 2D images is presented. This algorithm, entitled ÒProjective Constraint OptimizationÓ (PCO), leverages prior known information, such as symmetry and the fact that the protein is bound in a membrane, to extend the normal boundaries of resolution. In particular, traditional methods (Agard, 1983) make no attempt to account for the Òmissing coneÓ a vast, un-sampled, region in 3D Fourier space arising from specimen tilt limitations in the microscope. Provided sufficient data, PCO simultaneously refines the dataset, accounting for error, as well as attempting to fill this missing cone. Though PCO provides a near-optimal 3D reconstruction based on data, depending on initial data quality and amount of prior knowledge, there may be a host of solutions, and more importantly pseudo-solutions, which are more-or-less consistent with the provided dataset. Trying to find a global best-fit for known information and data can be a daunting challenge mathematically, to this end the use of meta-heuristics is addressed. Specifically, in the case of many pseudo-solutions, so long as a suitably defined error metric can be found, quasi-evolutionary swarm algorithms can be used that search solution space, sharing data as they go. Given sufficient computational power, such algorithms can dramatically reduce the search time for global optimums for a given dataset. Once the structure of a protein has been determined, many questions often remain about its function. Questions about the dynamics of a protein, for instance, are not often readily interpretable from structure alone. To this end an investigation into computationally optimized structural dynamics is described. Here, in order to find the most likely path a protein might take through Òconformation spaceÓ between two conformations, a graphics processing unit (GPU) optimized program and set of libraries is written to speed of the calculation of this process 30x. The tools and methods developed here serve as a conceptual template as to how GPU coding was applied to other aspects of the work presented here as well as GPU programming generally. The final portion of the thesis takes an apparent step in reverse, presenting a dramatic, yet highly predictive, simplification of a complex biological process. Kinetic Monte Carlo simulations idealize thousands of proteins as interacting agents by a set of simple rules (i.e. react/dissociate), offering highly-accurate insights into the large-scale cooperative behavior of proteins. This work demonstrates that, for many applications, structure, dynamics or even general knowledge of a protein may not be necessary for a meaningful biological story to emerge. Additionally, even in cases where structure and function is known, such simulations can help to answer the biological question in its entirety from structure, to dynamics, to ultimate function
    corecore