33 research outputs found

    A conformational analysis of signal peptides

    Get PDF
    A thesis submitted to the Faculty of Science University of the Witwatersrand in fulfillment of the requirements for the degree of Doctor of Philosophy. Johannesburg, 1998.Conformational analysis of portions of functionally-active and functionally-inactive signal peptides (incorporating the wild-type and mutants thereof) has been performed using a variety of computational prediction techniques based on both statistics and molecular mechanics. Molecular mechanics conformational studies are generally plagued by the problem of combinatorial explosion; this problem was addressed with a systematic searching procedure as well as a recently developed genetic algorithm, both utilising tile ECEPP/3 force field. The genetic algorithm, in combination with a gradient minimiser, proved to be successful in finding low-energy conformations for each peptide sequence studied. Analysis was performed in both simulated hydrophobic and hydrophilic environments, under distance-constraints. The molecular mechanics results and statistical predictions generated from the study were compared With existing experimental observations. The reliability of statistical predictions proved to be dependent on prediction method; the more consistent predictions were produced by methods based on membrane proteins, as opposed to those based on globular proteins. The physical property of hydrophobicity of signal peptide sequences, explored in these statistical predictions, was determined to be an important factor in relating sequence to functional activity. Molecular mechanics calculations produced either interrupted or non interrupted a-helical secondary structures both for functionally-efficient and for functionally-inefficient signal peptides, indicating that cc-helixformation alone cannot be correlated with protein export competence. It was concluded from our overall results that both a-helicity and hydrophobicity are required for the efficient functioning of signal peptides.AC201

    Optimization in bioinformatics

    Get PDF
    In this work, we present novel optimization approaches for important bioinformatical problems. The rst part deals mainly with the local optimization of molecular structures and its applications to molecular docking, while the second part discusses discrete global optimization. In the rst part, we present a novel algorithm to an old task: nd the next local optimum into a given direction on a molecular potential energy function (line search). We show that replacing a standard line search method with the new algorithm reduced the number of function/gradient evaluations in our test runs down to 47.7% (down to 85% on average) . Then, we include this method into our novel approach for locally optimizing exible ligands in the presence of their receptors, which we describe in detail, avoiding the singularity problem of orientational parameters. We extend this approach to a full ligand-receptor docking program using a Lamarckian genetic algorithm. Our validation runs show that we gained an up to tenfold speedup in comparison to other tested methods. Then, we further incorporate side chain exibility of the receptor into our approach and introduce limited backbone exibility by interpolating between known extremal conformations using spherical linear extrapolation. Our results show that this approach is very promising for exible ligand-receptor docking. However, the drawback is that we need known extremal backbone conformations for the interpolation. In the last section of the rst part, we allow a loop region to be fully exible. We present a new method to nd all possible conformations using the Go-Scheraga ring closure equations and interval arithmetic. Our results show that this algorithm reliably nds alternative conformations and is able to identify promising loop/ligand complexes of the studied example. In the second part of this work, we describe the bond order assignment problem for molecular structures. We present our novel linear 0-1-programming formulation for the very efficient computation of all optimal and suboptimal bond order assignments and show that our approach does not only outperform the original heuristic approach of Wang et al. but also commonly used software for determining bond orders on our test set considering all optimal results. This test set consists of 761 thoroughly prepared drug like molecules that were originally used for the validation of the Merck Molecular Force Field. Then, we present our lter method for feature subset selection that is based on mutual information and uses second order information. We show our mathematically well motivated criterion and, in contrast to other methods, solve the resulting optimization problem exactly by quadratic 0-1-programming. In the validation runs, our method could achieve in 18 out of 21 test scenarios the best classification accuracies. In the last section, we give our integer linear programming formulation for the detection of deregulated subgraphs in regulatory networks using expression proles. Our approach identies the subnetwork of a certain size of the regulatory network with the highest sum of node scores. To demonstrate the capabilities of our algorithm, we analyzed expression proles from nonmalignant primary mammary epithelial cells derived from BRCA1 mutation carriers and epithelial cells without BRCA1 mutation. Our results suggest that oxidative stress plays an important role in epithelial cells with BRCA1 mutations that may contribute to the later development of breast cancer. The application of our algorithm to already published data can yield new insights. As expression data and network data are still growing, methods as our algorithm will be valuable to detect deregulated subgraphs in different conditions and help contribute to a better understanding of diseases.In der vorliegenden Arbeit prĂ€sentieren wir neue OptimierungsansĂ€tze fĂŒr wichtige Probleme der Bioinformatik. Der erste Teil behandelt vorwiegend die lokale Optimierung von MolekĂŒlen und die Anwendung beim molekularen Docking. Der zweite Teil diskutiert diskrete globale Optimierung. Im ersten Teil prĂ€sentieren wir einen neuartigen Algorithmus fĂŒr ein altes Problem: finde das nĂ€chste lokale Optimum in einer gegebenen Richtung auf einer Energiefunktion (Liniensuche, "line search"). Wir zeigen, dass die Ersetzung einer Standardliniensuche mit unserer neuen Methode die Anzahl der Funktions- und Gradientauswertungen in unseren TestlĂ€ufen auf bis zu 47.7% reduzierte (85% im Mittel). Danach nehmen wir diese Methode in unseren neuen Ansatz zur lokalen Optimierung von flexiblen Liganden im Beisein ihres Rezeptors auf, den wir im Detail beschreiben. Unser Verfahren vermeidet das SingularitĂ€tsproblem von Orientierungsparametern. Wir erweitern diese Methode zu einem vollstĂ€ndigen Liganden-Rezeptor-Dockingprogramm, indem wir einen Lamarck'schen genetischen Algorithmus einsetzen. Unsere ValidierungslĂ€ufe zeigen, dass wir im Vergleich zu anderen getesteten Methoden einen bis zu zehnfachen Geschwindigkeitszuwachs erreichen. Danach arbeiten wir in unseren Ansatz Seitenketten- und begrenzte Backbone exibilitĂ€t ein, indem wir zwischen bekannten Extremkonformationen mittels sphĂ€rischer linearer Extrapolation interpolieren. Unsere Resultate zeigen, dass unsere Methode sehr viel versprechend fĂŒr flexibles Liganden-Rezeptor-Docking ist. Dennoch hat dieser Ansatz den Nachteil, dass man bekannte Extremkonformationen des Backbones fĂŒr die Interpolation benötigt. Im letzten Abschnitt des ersten Teils behandeln wir eine Loopregion voll flexibel. Wir zeigen eine neue Methode, die die Go-Scheraga Ringschlussgleichungen und Intervalarithmetik nutzt, um alle möglichen Konformationen zu nden. Unsere Resultate zeigen, dass dieser Algorithmus zuverlĂ€ssig in der Lage ist, alternative Konformationen zu nden. Er identiziert sehr vielversprechende Loop-Ligandenkomplexe unseres Testbeispiels. Im zweiten Teil dieser Arbeit beschreiben wir das Bindungsordnungszuweisungsproblem von MolekĂŒlen. Wir prĂ€sentieren unsere neuartige Formulierung, die auf linearer 0-1-Programmierung basiert. Dieser Ansatz ist in der Lage sehr effizient alle optimalen und suboptimalen Bindngsordnungszuweisungen zu berechnen. Unsere Methode ist nicht nur besser als der ursprĂŒngliche Ansatz von Wang et al., sondern auch weitverbreiteter Software zur Bindungszuordnung auf unserem Testdatensatz ĂŒberlegen. Dieser Datensatz besteht aus 761 sorgfĂ€ltig prĂ€parierten, arzneimittelĂ€hnlichen MolekĂŒlen, die ursprĂŒnglich zur Validierung des Merck-Kraftfeldes eingesetzt wurden. Danach prĂ€sentieren wir unsere Filtermethode zur "Feature Subset Selection", die auf "Mutual Information" basiert und Informationen zweiter Ordnung nutzt. Wir geben unser mathematisch motiviertes Kriterium an und lösen das resultierende Optimierungsproblem global optimal im Gegensatz zu anderen AnsĂ€tzen. In unseren ValidierungslĂ€ufen konnte unsere Methode in 18 von 21 Testszenarien die beste Klassizierungsrate erreichen. Im letzten Abschnitt geben wir unsere, auf linearer 0-1-Programmierung basierende Formulierung zur Berechnung von deregulierten Untergraphen in regulatorischen Netzwerken an. Die Basisdaten fĂŒr diese Methode sind Expressionsprole. Unser Ansatz identiziert die Unternetze einer gewissen GrĂ¶ĂŸe mit der höchsten Summe der Knotenscores. Wir analysierten Expressionsprole von nicht bösartigen Brustepithelzellen von BRCA1 MutationstrĂ€gern und Epithelzellen ohne BRCA1 Mutation, um die FĂ€higkeiten unseres Algorithmuses zu demonstrieren. Unsere Resultate legen nahe, dass oxidativer Stress eine wichtige Rolle bei Epithelzellen mit BRCA1 Mutation spielt, der zur spĂ€teren Entwicklung von Brustkrebs beitragen könnte. Die Anwendung unseres Ansatzes auf bereits publizierte Daten kann zu neuen Erkenntnissen fĂŒhren. Da sowohl Expressions- wie auch Netzwerkdaten stĂ€ndig anwachsen, sind es Methoden wie unser Algorithmus die wertvoll sein werden, um deregulierte Subgraphen in verschiedenen Situationen zu entdecken. Damit trĂ€gt unser Ansatz zu einem besseren VerstĂ€ndnis von Krankheiten und deren Verlauf bei

    Developing Computational Tools for the Study and Design of Amyloid Materials

    Get PDF
    The self-assembly of short peptides into amyloid structures is linked to several diseases but has also been exploited for the design of novel functional amyloid-based materials. Such materials are potentially biocompatible and biodegradable, while their unique molecular organization provides them with remarkable mechanical properties. Amyloid fibrils are among the stiffest biological materials and exhibit a high resistance to breakage. Apart from the aforementioned properties, they are particularly attractive due to their easy synthesis and the ability to be redesigned through mutations at sequence level, which can result in potential functionality. Previous studies have reported the rational based design of functional amyloid materials, designed through primarily scientists’ intuition, and their applications in several fields as agents for tissue-engineering, antimicrobial and antibacterial agents, drug carriers, materials for separation applications, etc. The current work starts from the use of previously reported protocols for the computational elucidation of the structure of amyloids, leading to the formation of amyloid materials, and the investigation of the functional properties of rationally designed self-assembling peptides, and introduces a new approach for the computational design of functional amyloid materials, based on engineering and biophysical principles. In summary, we developed a computational protocol according to which an optimization-based design model is used to introduce mutations at non-ÎČsheet residue positions of an amyloid designable scaffold (amyloid with non-ÎČ-sheet forming residues at its termini). The designed amino acids are introduced to the scaffold in such a way so that they mimic how amino acids bind to particular ions/compounds of interest according to experimentally resolved structures (defined by us as materialphore models) and also aim at energetically stabilizing the bound conformation of the pockets. The optimum designs are computationally validated using a series of simulations and structural analysis techniques to select the top designed peptides, which are predicted to form fibrils with specific ion/compound binding properties for experimental testing. The computational protocol has been implemented first for the design of amyloid materials (i) binding to cesium ions, and in additional cases, for the design of amyloid materials (ii) serving as potential AD drug carriers, (iii) which could promote cell-penetration and possess DNA binding properties, and (iv) incorporating potential cell-adhesion, calcium and strontium binding properties. The computational protocol is also presented here as a step toward a generalized computational approach to design functional amyloid materials binding to an ion/compound of interest. This work can constitute a stepping stone for the functionalization of peptide/protein-based materials for several applications in the future

    A treatment of stereochemistry in computer aided organic synthesis

    Get PDF
    This thesis describes the author’s contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph‐free matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo‐asymmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron‐withdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one‐step retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Machine Learning in Discrete Molecular Spaces

    Get PDF
    The past decade has seen an explosion of machine learning in chemistry. Whether it is in property prediction, synthesis, molecular design, or any other subdivision, machine learning seems poised to become an integral, if not a dominant, component of future research efforts. This extraordinary capacity rests on the interac- tion between machine learning models and the underlying chemical data landscape commonly referred to as chemical space. Chemical space has multiple incarnations, but is generally considered the space of all possible molecules. In this sense, it is one example of a molecular set: an arbitrary collection of molecules. This thesis is devoted to precisely these objects, and particularly how they interact with machine learning models. This work is predicated on the idea that by better understanding the relationship between molecular sets and the models trained on them we can improve models, achieve greater interpretability, and further break down the walls between data-driven and human-centric chemistry. The hope is that this enables the full predictive power of machine learning to be leveraged while continuing to build our understanding of chemistry. The first three chapters of this thesis introduce and reviews the necessary machine learning theory, particularly the tools that have been specially designed for chemical problems. This is followed by an extensive literature review in which the contributions of machine learning to multiple facets of chemistry over the last two decades are explored. Chapters 4-7 explore the research conducted throughout this PhD. Here we explore how we can meaningfully describe the properties of an arbitrary set of molecules through information theory; how we can determine the most informative data points in a set of molecules; how graph signal processing can be used to understand the relationship between the chosen molecular representation, the property, and the machine learning model; and finally how this approach can be brought to bear on protein space. Each of these sub-projects briefly explores the necessary mathematical theory before leveraging it to provide approaches that resolve the posed problems. We conclude with a summary of the contributions of this work and outline fruitful avenues for further exploration

    Applications Development for the Computational Grid

    Get PDF

    Integrated Chemical Processes in Liquid Multiphase Systems

    Get PDF
    The essential principles of green chemistry are the use of renewable raw materials, highly efficient catalysts and green solvents linked with energy efficiency and process optimization in real-time. Experts from different fields show, how to examine all levels from the molecular elementary steps up to the design and operation of an entire plant for developing novel and efficient production processes
    corecore