75 research outputs found

    Enhancing reliability with Latin Square redundancy on desktop grids.

    Get PDF
    Computational grids are some of the largest computer systems in existence today. Unfortunately they are also, in many cases, the least reliable. This research examines the use of redundancy with permutation as a method of improving reliability in computational grid applications. Three primary avenues are explored - development of a new redundancy model, the Replication and Permutation Paradigm (RPP) for computational grids, development of grid simulation software for testing RPP against other redundancy methods and, finally, running a program on a live grid using RPP. An important part of RPP involves distributing data and tasks across the grid in Latin Square fashion. Two theorems and subsequent proofs regarding Latin Squares are developed. The theorems describe the changing position of symbols between the rows of a standard Latin Square. When a symbol is missing because a column is removed the theorems provide a basis for determining the next row and column where the missing symbol can be found. Interesting in their own right, the theorems have implications for redundancy. In terms of the redundancy model, the theorems allow one to state the maximum makespan in the face of missing computational hosts when using Latin Square redundancy. The simulator software was developed and used to compare different data and task distribution schemes on a simulated grid. The software clearly showed the advantage of running RPP, which resulted in faster completion times in the face of computational host failures. The Latin Square method also fails gracefully in that jobs complete with massive node failure while increasing makespan. Finally an Inductive Logic Program (ILP) for pharmacophore search was executed, using a Latin Square redundancy methodology, on a Condor grid in the Dahlem Lab at the University of Louisville Speed School of Engineering. All jobs completed, even in the face of large numbers of randomly generated computational host failures

    The pharmacophore kernel for virtual screening with support vector machines

    Full text link
    We introduce a family of positive definite kernels specifically optimized for the manipulation of 3D structures of molecules with kernel methods. The kernels are based on the comparison of the three-points pharmacophores present in the 3D structures of molecul es, a set of molecular features known to be particularly relevant for virtual screening applications. We present a computationally demanding exact implementation of these kernels, as well as fast approximations related to the classical fingerprint-based approa ches. Experimental results suggest that this new approach outperforms state-of-the-art algorithms based on the 2D structure of mol ecules for the detection of inhibitors of several drug targets

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    Open Babel: An open chemical toolbox

    Get PDF
    Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Kern-basierte Lernverfahren für das virtuelle Screening

    Get PDF
    We investigate the utility of modern kernel-based machine learning methods for ligand-based virtual screening. In particular, we introduce a new graph kernel based on iterative graph similarity and optimal assignments, apply kernel principle component analysis to projection error-based novelty detection, and discover a new selective agonist of the peroxisome proliferator-activated receptor gamma using Gaussian process regression. Virtual screening, the computational ranking of compounds with respect to a predicted property, is a cheminformatics problem relevant to the hit generation phase of drug development. Its ligand-based variant relies on the similarity principle, which states that (structurally) similar compounds tend to have similar properties. We describe the kernel-based machine learning approach to ligand-based virtual screening; in this, we stress the role of molecular representations, including the (dis)similarity measures defined on them, investigate effects in high-dimensional chemical descriptor spaces and their consequences for similarity-based approaches, review literature recommendations on retrospective virtual screening, and present an example workflow. Graph kernels are formal similarity measures that are defined directly on graphs, such as the annotated molecular structure graph, and correspond to inner products. We review graph kernels, in particular those based on random walks, subgraphs, and optimal vertex assignments. Combining the latter with an iterative graph similarity scheme, we develop the iterative similarity optimal assignment graph kernel, give an iterative algorithm for its computation, prove convergence of the algorithm and the uniqueness of the solution, and provide an upper bound on the number of iterations necessary to achieve a desired precision. In a retrospective virtual screening study, our kernel consistently improved performance over chemical descriptors as well as other optimal assignment graph kernels. Chemical data sets often lie on manifolds of lower dimensionality than the embedding chemical descriptor space. Dimensionality reduction methods try to identify these manifolds, effectively providing descriptive models of the data. For spectral methods based on kernel principle component analysis, the projection error is a quantitative measure of how well new samples are described by such models. This can be used for the identification of compounds structurally dissimilar to the training samples, leading to projection error-based novelty detection for virtual screening using only positive samples. We provide proof of principle by using principle component analysis to learn the concept of fatty acids. The peroxisome proliferator-activated receptor (PPAR) is a nuclear transcription factor that regulates lipid and glucose metabolism, playing a crucial role in the development of type 2 diabetes and dyslipidemia. We establish a Gaussian process regression model for PPAR gamma agonists using a combination of chemical descriptors and the iterative similarity optimal assignment kernel via multiple kernel learning. Screening of a vendor library and subsequent testing of 15 selected compounds in a cell-based transactivation assay resulted in 4 active compounds. One compound, a natural product with cyclobutane scaffold, is a full selective PPAR gamma agonist (EC50 = 10 +/- 0.2 muM, inactive on PPAR alpha and PPAR beta/delta at 10 muM). The study delivered a novel PPAR gamma agonist, de-orphanized a natural bioactive product, and, hints at the natural product origins of pharmacophore patterns in synthetic ligands.Wir untersuchen moderne Kern-basierte maschinelle Lernverfahren für das Liganden-basierte virtuelle Screening. Insbesondere entwickeln wir einen neuen Graphkern auf Basis iterativer Graphähnlichkeit und optimaler Knotenzuordnungen, setzen die Kernhauptkomponentenanalyse für Projektionsfehler-basiertes Novelty Detection ein, und beschreiben die Entdeckung eines neuen selektiven Agonisten des Peroxisom-Proliferator-aktivierten Rezeptors gamma mit Hilfe von Gauß-Prozess-Regression. Virtuelles Screening ist die rechnergestützte Priorisierung von Molekülen bezüglich einer vorhergesagten Eigenschaft. Es handelt sich um ein Problem der Chemieinformatik, das in der Trefferfindungsphase der Medikamentenentwicklung auftritt. Seine Liganden-basierte Variante beruht auf dem Ähnlichkeitsprinzip, nach dem (strukturell) ähnliche Moleküle tendenziell ähnliche Eigenschaften haben. In unserer Beschreibung des Lösungsansatzes mit Kern-basierten Lernverfahren betonen wir die Bedeutung molekularer Repräsentationen, einschließlich der auf ihnen definierten (Un)ähnlichkeitsmaße. Wir untersuchen Effekte in hochdimensionalen chemischen Deskriptorräumen, ihre Auswirkungen auf Ähnlichkeits-basierte Verfahren und geben einen Literaturüberblick zu Empfehlungen zur retrospektiven Validierung, einschließlich eines Beispiel-Workflows. Graphkerne sind formale Ähnlichkeitsmaße, die inneren Produkten entsprechen und direkt auf Graphen, z.B. annotierten molekularen Strukturgraphen, definiert werden. Wir geben einen Literaturüberblick über Graphkerne, insbesondere solche, die auf zufälligen Irrfahrten, Subgraphen und optimalen Knotenzuordnungen beruhen. Indem wir letztere mit einem Ansatz zur iterativen Graphähnlichkeit kombinieren, entwickeln wir den iterative similarity optimal assignment Graphkern. Wir beschreiben einen iterativen Algorithmus, zeigen dessen Konvergenz sowie die Eindeutigkeit der Lösung, und geben eine obere Schranke für die Anzahl der benötigten Iterationen an. In einer retrospektiven Studie zeigte unser Graphkern konsistent bessere Ergebnisse als chemische Deskriptoren und andere, auf optimalen Knotenzuordnungen basierende Graphkerne. Chemische Datensätze liegen oft auf Mannigfaltigkeiten niedrigerer Dimensionalität als der umgebende chemische Deskriptorraum. Dimensionsreduktionsmethoden erlauben die Identifikation dieser Mannigfaltigkeiten und stellen dadurch deskriptive Modelle der Daten zur Verfügung. Für spektrale Methoden auf Basis der Kern-Hauptkomponentenanalyse ist der Projektionsfehler ein quantitatives Maß dafür, wie gut neue Daten von solchen Modellen beschrieben werden. Dies kann zur Identifikation von Molekülen verwendet werden, die strukturell unähnlich zu den Trainingsdaten sind, und erlaubt so Projektionsfehler-basiertes Novelty Detection für virtuelles Screening mit ausschließlich positiven Beispielen. Wir führen eine Machbarkeitsstudie zur Lernbarkeit des Konzepts von Fettsäuren durch die Hauptkomponentenanalyse durch. Der Peroxisom-Proliferator-aktivierte Rezeptor (PPAR) ist ein im Zellkern vorkommender Rezeptor, der den Fett- und Zuckerstoffwechsel reguliert. Er spielt eine wichtige Rolle in der Entwicklung von Krankheiten wie Typ-2-Diabetes und Dyslipidämie. Wir etablieren ein Gauß-Prozess-Regressionsmodell für PPAR gamma-Agonisten mit chemischen Deskriptoren und unserem Graphkern durch gleichzeitiges Lernen mehrerer Kerne. Das Screening einer kommerziellen Substanzbibliothek und die anschließende Testung 15 ausgewählter Substanzen in einem Zell-basierten Transaktivierungsassay ergab vier aktive Substanzen. Eine davon, ein Naturstoff mit Cyclobutan-Grundgerüst, ist ein voller selektiver PPAR gamma-Agonist (EC50 = 10 +/- 0,2 muM, inaktiv auf PPAR alpha und PPAR beta/delta bei 10 muM). Unsere Studie liefert einen neuen PPAR gamma-Agonisten, legt den Wirkmechanismus eines bioaktiven Naturstoffs offen, und erlaubt Rückschlüsse auf die Naturstoffursprünge von Pharmakophormustern in synthetischen Liganden

    Structure- and Ligand-Based Design of Novel Antimicrobial Agents

    Get PDF
    The use of computer based techniques in the design of novel therapeutic agents is a rapidly emerging field. Although the drug-design techniques utilized by Computational Medicinal Chemists vary greatly, they can roughly be classified into structure-based and ligand-based approaches. Structure-based methods utilize a solved structure of the design target, protein or DNA, usually obtained by X-ray or NMR methods to design or improve compounds with activity against the target. Ligand-based methods use active compounds with known affinity for a target that may yet be unresolved. These methods include Pharmacophore-based searching for novel active compounds or Quantitative Structure-Activity Relationship (QSAR) studies. The research presented here utilized both structure and ligand-based methods against two bacterial targets: Bacillus anthracis and Mycobacterium tuberculosis. The first part of this thesis details our efforts to design novel inhibitors of the enzyme dihydropteroate synthase from B. anthracis using crystal structures with known inhibitors bound. The second part describes a QSAR study that was performed using a series of novel nitrofuranyl compounds with known, whole-cell, inhibitory activity against M. tuberculosis. Dihydropteroate synthase (DHPS) catalyzes the addition of p-amino benzoic acid (pABA) to dihydropterin pyrophosphate (DHPP) to form pteroic acid as a key step in bacterial folate biosynthesis. It is the traditional target of the sulfonamide class of antibiotics. Unfortunately, bacterial resistance and adverse effects have limited the clinical utility of the sulfonamide antibiotics. Although six bacterial crystal structures are available, the flexible loop regions that enclose pABA during binding and contain key sulfonamide resistance sites have yet to be visualized in their functional conformation. To gain a new understanding of the structural basis of sulfonamide resistance, the molecular mechanism of DHPS action, and to generate a screening structure for high-throughput virtual screening, molecular dynamics simulations were applied to model the conformations of the unresolved loops in the active site. Several series of molecular dynamics simulations were designed and performed utilizing enzyme substrates and inhibitors, a transition state analog, and a pterin-sulfamethoxazole adduct. The positions of key mutation sites conserved across several bacterial species were closely monitored during these analyses. These residues were shown to interact closely with the sulfonamide binding site. The simulations helped us gain new understanding of the positions of the flexible loops during inhibitor binding that has allowed the development of a DHPS structural model that could be used for high-through put virtual screening (HTVS). Additionally, insights gained on the location and possible function of key mutation sites on the flexible loops will facilitate the design of new, potent inhibitors of DHPS that can bypass resistance mutations that render sulfonamides inactive. Prior to performing high-throughput virtual screening, the docking and scoring functions to be used were validated using established techniques against the B. anthracis DHPS target. In this validation study, five commonly used docking programs, FlexX, Surflex, Glide, GOLD, and DOCK, as well as nine scoring functions, were evaluated for their utility in virtual screening against the novel pterin binding site. Their performance in ligand docking and virtual screening against this target was examined by their ability to reproduce a known inhibitor conformation and to correctly detect known active compounds seeded into three separate decoy sets. Enrichment was demonstrated by calculated enrichment factors at 1% and Receiver Operating Characteristic (ROC) curves. The effectiveness of post-docking relaxation prior to rescoring and consensus scoring were also evaluated. Of the docking and scoring functions evaluated, Surflex with SurflexScore and Glide with GlideScore performed best overall for virtual screening against the DHPS target. The next phase of the DHPS structure-based drug design project involved high-throughput virtual screening against the DHPS structural model previously developed and docking methodology validated against this target. Two general virtual screening methods were employed. First, large, virtual libraries were pre-filtered by 3D pharmacophore and modified Rule-of-Three fragment constraints. Nearly 5 million compounds from the ZINC databases were screened generating 3,104 unique, fragment-like hits that were subsequently docked and ranked by score. Second, fragment docking without pharmacophore filtering was performed on almost 285,000 fragment-like compounds obtained from databases of commercial vendors. Hits from both virtual screens with high predicted affinity for the pterin binding pocket, as determined by docking score, were selected for in vitro testing. Activity and structure-activity relationship of the active fragment compounds have been developed. Several compounds with micromolar activity were identified and taken to crystallographic trials. Finally, in our ligand-based research into M. tuberculosis active agents, a series of nitrofuranylamide and related aromatic compounds displaying potent activity was investigated utilizing 3-Dimensional Quantitative Structure-Activity Relationship (3D-QSAR) techniques. Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Similarity Indices Analysis (CoMSIA) methods were used to produce 3D-QSAR models that correlated the Minimum Inhibitory Concentration (MIC) values against M. tuberculosis with the molecular structures of the active compounds. A training set of 95 active compounds was used to develop the models, which were then evaluated by a series of internal and external cross-validation techniques. A test set of 15 compounds was used for the external validation. Different alignment and ionization rules were investigated as well as the effect of global molecular descriptors including lipophilicity (cLogP, LogD), Polar Surface Area (PSA), and steric bulk (CMR), on model predictivity. Models with greater than 70% predictive ability, as determined by external validation and high internal validity (cross validated r2 \u3e .5) were developed. Incorporation of lipophilicity descriptors into the models had negligible effects on model predictivity. The models developed will be used to predict the activity of proposed new structures and advance the development of next generation nitrofuranyl and related nitroaromatic anti-tuberculosis agents
    corecore