2,138 research outputs found

    STRIDE: Structure-guided Generation for Inverse Design of Molecules

    Full text link
    Machine learning and especially deep learning has had an increasing impact on molecule and materials design. In particular, given the growing access to an abundance of high-quality small molecule data for generative modeling for drug design, results for drug discovery have been promising. However, for many important classes of materials such as catalysts, antioxidants, and metal-organic frameworks, such large datasets are not available. Such families of molecules with limited samples and structural similarities are especially prevalent for industrial applications. As is well-known, retraining and even fine-tuning are challenging on such small datasets. Novel, practically applicable molecules are most often derivatives of well-known molecules, suggesting approaches to addressing data scarcity. To address this problem, we introduce STRIDE\textbf{STRIDE}, a generative molecule workflow that generates novel molecules with an unconditional generative model guided by known molecules without any retraining. We generate molecules outside of the training data from a highly specialized set of antioxidant molecules. Our generated molecules have on average 21.7% lower synthetic accessibility scores and also reduce ionization potential by 5.9% of generated molecules via guiding

    Identification of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests

    Get PDF
    The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes

    Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

    Get PDF
    Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences\u27 structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm

    Functional architectures of polyketide synthases

    Get PDF
    Microbial polyketide synthases (PKS) are biological factories for the production of potent natural products, which include clinically relevant antibiotics, anti-cancer drugs, statins and more. The exceptional chemical diversity generated by PKSs is encoded in a modular architecture for precursor extension. The domains required for one step of precursor elongation and modification are combined into a functional polypeptide module, which is segregated into a mandatory condensing region for elongation and an optional and variable part for intermediate modification. PKS modules contain integral acyl carrier protein (ACP) domains, flanked by flexible peptide regions. ACPs are used to load substrates and to tether intermediates throughout ongoing synthesis, by linking them as thioesters to a covalently attached phosphopantetheine cofactor. PKS modules can either act iteratively (iPKS) or in a linearly organized assembly line of multiple modules (modPKS), where the nascent polyketide is handed over from one to the next module. The collinearity between synthesis and protein sequence in modPKS holds promise for rational re-engineering in order to produce novel bioactive compounds. Despite their cyclic mode of action, iPKS may employ specific reaction programs, which introduces different substitutions in each iteration by selective use of individual catalytic domains. At the beginning of the thesis, the architecture of PKS modules as a basis for their modular organization and programmed biosynthesis was unknown. This thesis was focused on structural studies of the architecture of PKS modules, intramodular crosstalk and functional programming. Chapter one provides a comprehensive introduction into the molecular biology of PKS function. Chapter two provides a hybrid crystallographic model of an iPKS module and demonstrates its relevance also for modPKS. Overlapping crystal structures of a condensing and a complete modifying region provided the first atomic model of a PKS module with a total of 10 catalytic domains. Multiple crystallogrpahically independent copies observed in the 3.75 Å structure of the dimeric modifying region provided snapshots of a variable linker-based architecture with implications for PKS evolution and conformational coupling of reaction steps in the dimeric synthase. Comparative small angle X-ray scattering demonstrates that the iPKS architecture is also representative for tested modPKSs. Chapter three reports the crystal structure of a programming C-methyltransferase (CMeT) domain at 1.65 Å resolution. The structure reveals a novel N-terminal fold and a substrate binding cavity that accommodates intermediates of various length during iterative biosynthesis. Structural and phylogenetic analysis demonstrates conservation of CMeT domains in PKS as well as homology to an inactive pseudo-CMeT (ΨCMeT) remnant in mammalian fatty acid synthase (mFAS). The data suggest an involvement of the core elongating ketosynthase (KS) domain in PKS programming. Chapter four provides a visualization of substrate loading in iPKS. A 2.8 Å resolution crystal structure provided detailed insights into an intertwined linker-mediated integration of substrate-loading starter-unit acyltransferase (SAT) domains into an iPKS condensing region. The post-loading state was trapped by mechanism-based crosslinking. Visualization by cryo electron microscopy at 7.1 Å resolution revealed asymmetry of ACP-KS interactions and depicts conformational coupling across the dimeric PKS for coordinated synthesis. Chapter five integrates the results into the current structural and biological context and discusses current opinions and future perspectives in the field. The results of this thesis reflect the relevance of linker-based connections rather than stable domain-domain interfaces for PKS architecture. This work also highlights mechanisms for conformational coupling for synthesis and substrate channeling in dimeric, but asymmetric, PKS. These insights will support re-engineering iPKS and modPKS assembly lines for the production of novel bioactive compounds, in particular for drug discovery

    Structural and biochemical characterization of biotechnologically relevant enzymes

    Get PDF
    Climate change, antibiotic resistances and environmental pollution are growing threats. Therefore, finding alternatives for fossil resources and discovery of new pharmaceuticals grows more important every day. Natural compounds and their in vivo production pathways proved to be a possible solution to overcome those problems. Optimized microbial hosts can serve as sustainable production platforms for various compounds as it is done for penicillin since many years. The first research topic of this thesis are borneol dehydrogenases, enzymes which convert borneol to camphor. Enantiomerically pure camphor has numerous applications in cosmetic, pharmaceutical, and chemical industry. Thus, enantioselective borneol dehydrogenases would be an attractive candidate to achieve enantiomerically pure camphor. To better understand the differences of enantioselective and unselective borneol dehydrogenases we solved the structures of two selective borneol dehydrogenases from Salvia rosmarinus and Salvia officinalis using X-ray crystallography and cryo-electron microscopy. The obtained structures were compared to the previously solved structure of the unselective borneol dehydrogenase of Pseudomonas sp. TCUHL1. The second focus of this thesis are terpene synthases, a class of enzymes responsible for the cyclization of linear terpene precursors. The products of terpene synthases are interesting candidates for the chemical and pharmaceutical industry due to their diverse characteristics and properties. Latest advances in genome sequencing enabled the discovery of many new and diverse terpene synthases from various organisms. We report on the discovery of two terpene synthases from Coniophora. puteana, Copu5 and Copu9, that not only have identical product profiles, but also show high yields in an optimized Escherichia coli strain. Main product of both enzymes is (+)-δ-cadinol that has been shown to have cytotoxic effect on MCF7 cells and could be used as a new and sustainable anti-tumor drug. To investigate their properties and gain deeper understanding into their function, we attempted to crystallize and biochemically characterize Copu5 and Copu9.Klimawandel, Antibiotikaresistenzen und Umweltverschmutzung sind wachsende Bedrohungen. Daher wird die Suche nach Alternativen für fossile Ressourcen und die Entdeckung neuer Arzneimittel von Tag zu Tag wichtiger. Naturstoffe und ihre in-vivo- Produktionswege bieten eine mögliche Lösung dieser Probleme. Optimierte mikrobielle Wirte können als nachhaltige Produktionsplattformen für verschiedene chemische Verbindungen dienen, wie es seit vielen Jahren für Penicillin üblich ist. Der erste Fokus dieser Arbeit sind Terpensynthasen, eine Klasse von Enzymen, die für die Zyklisierung von linearen Terpenvorläufern verantwortlich sind. Die Produkte der Terpensynthasen sind aufgrund ihrer vielfältigen Eigenschaften interessante Kandidaten für die chemische und pharmazeutische Industrie. Jüngste Fortschritte in der Genomsequenzierung ermöglichten die Entdeckung vieler neuer und vielfältiger Terpensynthasen aus verschiedenen Organismen. Wir berichten über die Entdeckung zweier Terpensynthasen aus Coniophora puteana, Copu5 und Copu9, die nicht nur identische Produktprofile aufweisen, sondern auch hohe Ausbeuten in einem optimierten Escherichia coli-Stamm zeigen. Hauptprodukt beider Enzyme ist (+)-δ- Cadinol, das nachweislich eine zytotoxische Wirkung auf MCF7-Zellen hat und als neues und nachhaltiges Antitumormittel eingesetzt werden könnte. Zur Untersuchung ihrer Eigenschaften und um ein tieferes Verständnis ihrer Funktion zu erlangen, haben wir versucht, Copu5 und Copu9 zu kristallisieren und biochemisch zu charakterisieren. Der zweite Schwerpunkt dieser Arbeit sind Borneoldehydrogenasen, Enzyme, die Borneol zu Kampfer umwandeln. Enantiomerenreiner Kampfer hat zahlreiche Anwendungen in der kosmetischen, pharmazeutischen und chemischen Industrie. Daher wären enantioselektive Borneoldehydrogenasen ein attraktiver Kandidat zur Herstellung von enantiomerenreinem Kampfer. Um die Unterschiede zwischen enantioselektiven und unselektiven Borneoldehydrogenasen besser zu verstehen, haben wir die Strukturen zweier selektiver Borneoldehydrogenasen aus Salvia rosmarinus und Salvia officinalis mittels Röntgenkristallographie und Kryo-Elektronenmikroskopie gelöst. Die erhaltenen Strukturen wurden mit der zuvor gelösten Struktur der unselektiven Borneol- Dehydrogenase von Pseudomonas sp. TCU-HL1 verglichen

    Structural biology of carbohydrate transfer and modification in natural product biosynthesis

    Get PDF
    Certain organisms, can during periods of limited resources, adapt their metabolism to enable biosynthesis of secondary metabolites, compounds that increase competitiveness and chances of survival. The subjects of this thesis are enzymes acting on carbohydrate substrates during secondary metabolism. The enzymatic attachment of carbohydrate moieties onto precursors of polyketide antibiotics such as anthracyclines, required for their biological activity, is performed by glycosyltransferases (GT). The anthracycline nogalamycin contains two carbohydrates: a nogalose moiety attached via an O-glycosidic bond to C7, and a nogalamine attached via an O-glycosidic bond to C1 and an unusual carbon-carbon bond between C2 and C5´´ of the sugar. Genetic and functional data presented in this thesis established the roles of SnogE as the GT performing the C7 O-glycosyl transfer of the nogalose moiety and SnogD as the O-GT attaching the nogalamine moiety onto the C1 carbon. The activity of SnogD was verified in vitro using recombinant protein, following establishment of a transglycosylation-like assay. The three-dimensional structure of the homo-dimeric SnogD was determined to 2.6 Å and consists of a GT-B fold. Mutagenesis of two active site residues, His25 and His301, evaluated in vitro and in vivo, suggested His25 to be the catalytic base, activating the acceptor substrate by proton abstraction from the C1-hydroxyl group. His301 provides a positive charge to stabilise the negative charge formed close to the diphosphate of the leaving group during glycosyl transfer. Genetic, functional and structural data together suggest the involvement of an additional or altogether different enzyme for the C-C bond formation. The bifunctional enzyme aldos-2-ulose dehydratase (AUDH) from Phanerochaete chrysosporium catalyses the dehydration and isomerisation of the secondary metabolites glucosone and 1,5-anhydro-D-fructose (AF) into the antimicrobial compounds cortalcerone and microthecin (Mic), respectively. The threedimensional structure of the dimeric AUDH was determined to 2.0 Å. The enzyme consists of a seven bladed ß-propeller, two cupin folds and a lectin-like domain, in a novel combination. Two structural metal ions, Mg2+ and Zn2+, are bound in loop regions. Two additional zinc ions are present at the base of two putative active sites, located in the ß-propeller and the second cupin fold. The specific removal of these zinc ions eliminated catalytic activity, proving the metal dependency of the overall reaction. The structure of AUDH in complex with the reaction intermediate ascopyrone M bound at both putative active sites, and a complex of zinc-depleted enzyme with AF bound in the cupin fold have been determined by X-ray crystallography to 2.6 and 2.8 Å resolution, respectively. These observations support the presence of two distinct active sites located 60 Å apart, partly connected by an intra-dimeric channel. The dehydration reaction most likely follows an elimination reaction with the zinc ion acting as a Lewis acid to polarise the C2 keto group of AF. Abstraction of the C3 proton by the suitably located residue His155 would generate an enol intermediate, which is stabilised by the zinc ion. Return of the proton to the C4 hydroxyl group would generate a favourable leaving group
    • …
    corecore