Search CORE

104,737 research outputs found

Computational methods for small molecule identification

Author: Dührkop Kai
Publication venue
Publication date: 01/01/2018
Field of study

Identification of small molecules remains a central question in analytical chemistry, in particular for natural product research, metabolomics, environmental research, and biomarker discovery. Mass spectrometry is the predominant technique for high-throughput analysis of small molecules. But it reveals only information about the mass of molecules and, by using tandem mass spectrometry, about the mass of molecular fragments. Automated interpretation of mass spectra is often limited to searching in spectral libraries, such that we can only dereplicate molecules for which we have already recorded reference mass spectra. In this thesis we present methods for answering two central questions: What is the molecular formula of the measured ion and what is its molecular structure? SIRIUS is a combinatorial optimization method for annotating a spectrum and identifying the ion's molecular formula by computing hypothetical fragmentation trees. We present a new scoring for computing fragmentation trees, transforming the combinatorial optimization into a maximum a posteriori estimator. This allows us to learn parameters and hyperparameters of the scoring directly from data. We demonstrate that the statistical model, which was fitted on a small dataset, generalises well across many different datasets and mass spectrometry instruments. In addition to tandem mass spectra, isotope pattern can be used for identifying the molecular formula of the precursor ion. We present a novel scoring for comparing isotope patterns based on maximum likelihood. We describe how to integrate the isotope pattern analysis into the fragmentation tree optimisation problem to analyse data were fragment peaks and isotope peaks occur within the same spectrum. We demonstrate that the new scorings significantly improves on the task of molecular formula assignment. We evaluate SIRIUS on several datasets and show that it outperforms all other methods for molecular formula annotation by a large margin. We also present CSI:FingerID, a method for predicting a molecular fingerprint from a tandem mass spectrum using kernel support vector machines. The predicted fingerprint can be searched in a structure database to identify the molecular structure. CSI:FingerID is based on FingerID, that uses probability product kernels on mass spectra for this task. We describe several novel kernels for comparing fragmentation trees instead of spectra. These kernels are combined using multiple kernel learning. We present a new scoring based on posterior probabilities and extend the method to use additional molecular fingerprints. We demonstrate on several datasets that CSI:FingerID identifies more molecules than its predecessor FingerID and outperforms all other methods for this task. We analyse how each of the methodological improvements of CSI:FingerID contributes to its identification performance and make suggestions for future improvements of the method. Both methods, SIRIUS and CSI:FingerID, are available as commandline tool and as user interface. The molecular fingerprint prediction is implemented as web service and receives over one million requests per month.Die Identifizierung kleiner Moleküle ist eine zentrale Fragestellung der analytischen Chemie, insbesondere in der Naturwirkstoffforschung, der Metabolomik, der Ökologie und Umweltforschung sowie in der Entwicklung neuer Diagnoseverfahren mittels Biomarker. Massenspektrometrie ist die vorherrschende Technik für Hochdurchsatzanalysen kleiner Moleküle. Aber sie liefert nur Informationen über die Masse der gemessenen Moleküle und, mittels Tandem-Massenspektrometrie, über die Massen der gemessenen Fragmente. Die automatisierte Auswertung von Massenspektren beschränkt sich oft auf die Suche in Spektrendatenbanken, so dass nur Moleküle derepliziert werden können, die bereits in einer solchen Datenbank gemessen wurden. In dieser Dissertation präsentieren wir zwei Methoden zur Beantwortung zweier zentraler Fragen: Was ist die Molekülformel eines gemessenen Ions? Und was ist seine Molekülstruktur? SIRIUS ist eine Methode der kombinatorischen Optimierung für die Annotation von Massenspektren und der Identifikation der Molekülformel. Dazu berechnet sie hypothetische Fragmentierungsbäume. Wir stellen ein neues Scoring Modell für die Berechnung von Fragmentierungsbäumen vor, welches die kombinatorische Optimierung als einen Maximum-a-posteriori-Schätzer auffasst. Dieses Modell ermöglicht es uns, Parameter und Hyperparameter des Scorings direkt aus den Daten abzuschätzen. Wir zeigen, dass dieses statistische Modell, dessen (Hyper)Parameter auf einem kleinen Datensatz geschätzt wurden, allgemeingültig für viele Datensätze und sogar für verschiedene Massenspektrometriegeräte ist. Neben Tandem-Massenspektren lassen sich auch Isotopenmuster zur Molekülformelidentifizierung des Ions verwenden. Wir stellen ein neuartiges Scoring für den Vergleich von Isotopenmustern vor, welches auf Maximum Likelihood basiert. Wir beschreiben, wie die Isotopenmusteranalyse in das Optimierungsproblem für Fragmentierungsbäume integriert werden kann, so dass sich auch Daten analysieren lassen, in denen Fragmente und Isotopenmuster im selben Massenspektrum gemessen werden. Wir zeigen, dass das neue Scoring die korrekte Zuweisung der Molekülformeln signifikant verbessert. Wir evaluieren SIRIUS auf einer Vielzahl von Datensätzen und zeigen, dass die Methode deutlich besser funktioniert als alle anderen Methoden für die Identifikation von Molekülformeln. Wir stellen außerdem CSI:FingerID vor, eine Methode, die Kernel Support Vector Maschinen zur Vorhersage von molekularen Fingerabdrücken aus Tandem-Massenspektren nutzt. Vorhergesagte molekulare Fingerabdrücke können in Strukturdatenbanken gesucht werden, um die genaue Molekülstruktur aufzuklären. CSI:FingerID basiert auf FingerID, welches Wahrscheinlichkeitsprodukt-Kernels für diese Aufgabe benutzt. Wir beschreiben etliche neue Kernels, zum Vergleich von Fragmentierungsbäumen anstelle von Massenspektren. Diese Kernels werden mittels Multiple Kernel Learning zu einem Kernel kombiniert. Wir stellen ein neues Scoring vor, welches auf A-posteriori-Wahrscheinlichkeiten basiert. Außerdem erweitern wir die Methode, so dass sie zusätzliche molekulare Fingerabdrücke verwendet. Wir zeigen auf verschiedenen Testdatensätzen, dass CSI:FingerID mehr Molekülstrukturen identifizieren kann als der Vorgänger FingerID, und damit auch alle anderen Methoden für diese Anwendung übertrifft. Wir werten aus, wie die verschiedenen methodischen Erweiterung zur Identifikationsrate von CSI:FingerID beitragen und machen Vorschläge für künftige Verbesserungen der Methode. Beide Methoden, SIRIUS und CSI:FingerID, sind als Kommandozeilenprogramm und als Benutzeroberfläche verfügbar. Die Vorhersage molekularer Fingerabdrücke ist als Webservice implementiert, der über eine Millionen Anfragen pro Monat erhält

Digitale Bibliothek Thüringen

Quantum Chemistry Calculations for Metabolomics

Author: Borges Ricardo M.
Colby Sean M.
Das Susanta K.
Edison Arthur S.
Fiehn Oliver
Kind Tobias
Lee Jesi
Merrill Amy T.
Merz Kenneth M., Jr.
Metz Thomas O.
Nunez Jamie R.
Renslow Ryan S.
Tantillo Dean J.
Wang Lee-Ping
Wang Shunyang
Publication venue: Digital Commons @ Kettering University
Publication date: 12/05/2021
Field of study

A primary goal of metabolomics studies is to fully characterize the small-molecule composition of complex biological and environmental samples. However, despite advances in analytical technologies over the past two decades, the majority of small molecules in complex samples are not readily identifiable due to the immense structural and chemical diversity present within the metabolome. Current gold-standard identification methods rely on reference libraries built using authentic chemical materials (“standards”), which are not available for most molecules. Computational quantum chemistry methods, which can be used to calculate chemical properties that are then measured by analytical platforms, offer an alternative route for building reference libraries, i.e., in silico libraries for “standards-free” identification. In this review, we cover the major roadblocks currently facing metabolomics and discuss applications where quantum chemistry calculations offer a solution. Several successful examples for nuclear magnetic resonance spectroscopy, ion mobility spectrometry, infrared spectroscopy, and mass spectrometry methods are reviewed. Finally, we consider current best practices, sources of error, and provide an outlook for quantum chemistry calculations in metabolomics studies. We expect this review will inspire researchers in the field of small-molecule identification to accelerate adoption of in silico methods for generation of reference libraries and to add quantum chemistry calculations as another tool at their disposal to characterize complex samples.A primary goal of metabolomics studies is to fully characterize the small-molecule composition of complex biological and environmental samples. However, despite advances in analytical technologies over the past two decades, the majority of small molecules in complex samples are not readily identifiable due to the immense structural and chemical diversity present within the metabolome. Current gold-standard identification methods rely on reference libraries built using authentic chemical materials (“standards”), which are not available for most molecules. Computational quantum chemistry methods, which can be used to calculate chemical properties that are then measured by analytical platforms, offer an alternative route for building reference libraries, i.e., in silico libraries for “standards-free” identification. In this review, we cover the major roadblocks currently facing metabolomics and discuss applications where quantum chemistry calculations offer a solution. Several successful examples for nuclear magnetic resonance spectroscopy, ion mobility spectrometry, infrared spectroscopy, and mass spectrometry methods are reviewed. Finally, we consider current best practices, sources of error, and provide an outlook for quantum chemistry calculations in metabolomics studies. We expect this review will inspire researchers in the field of small-molecule identification to accelerate adoption of in silico methods for generation of reference libraries and to add quantum chemistry calculations as another tool at their disposal to characterize complex samples

Kettering University

Hot-spot analysis for drug discovery targeting protein-protein interactions

Author: Fernández-Recio Juan
Rosell Mireia
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

Introduction: Protein-protein interactions are important for biological processes and pathological situations, and are attractive targets for drug discovery. However, rational drug design targeting protein-protein interactions is still highly challenging. Hot-spot residues are seen as the best option to target such interactions, but their identification requires detailed structural and energetic characterization, which is only available for a tiny fraction of protein interactions. Areas covered: In this review, the authors cover a variety of computational methods that have been reported for the energetic analysis of protein-protein interfaces in search of hot-spots, and the structural modeling of protein-protein complexes by docking. This can help to rationalize the discovery of small-molecule inhibitors of protein-protein interfaces of therapeutic interest. Computational analysis and docking can help to locate the interface, molecular dynamics can be used to find suitable cavities, and hot-spot predictions can focus the search for inhibitors of protein-protein interactions. Expert opinion: A major difficulty for applying rational drug design methods to protein-protein interactions is that in the majority of cases the complex structure is not available. Fortunately, computational docking can complement experimental data. An interesting aspect to explore in the future is the integration of these strategies for targeting PPIs with large-scale mutational analysis.This work has been funded by grants BIO2016-79930-R and SEV-2015-0493 from the Spanish Ministry of Economy, Industry and Competitiveness, and grant EFA086/15 from EU Interreg V POCTEFA. M Rosell is supported by an FPI fellowship from the Severo Ochoa program. The authors are grateful for the support of the the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Recommended from our members

Improvements in Molecular Mechanics Sampling and Energy Models

Author: Bylund Joseph
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

The process of bringing drugs to market continues to be a slow and expensive affair. And despite recent advances in technology, the cost both in monetary terms and in terms of time between target identification and arrival of a new drug on the market continues to increase. High throughput screening is a first step towards testing a large number of possible bioactive compounds very quickly. However, the space of possible small molecules is limitless, and high throughput screening is limited both by the size of available libraries and the cost of running such a large number of experiments. Therefore, advancements in computational drug screening are necessary in order to maintain the current rate of progress in modern medicine. Computational drug design, or computer assisted drug design, offers a possible way of addressing some of the shortfalls of conventional high throughput screening. Using computational methods, it is possible to estimate parameters such as binding affinity of any small molecule, even those not currently present in any small molecule library, without having to first invest in the often slow and expensive process of finding a synthetic pathway. Computational methods can be used to screen similar molecules, or mutations in small molecule space, seeking to increase binding affinity to the protein target, and thereby efficacy, while simultaneously minimizing binding affinity to other proteins, decreasing cross reactivity, and reducing toxicity and harmful side effects.Computational biology methods of drug research can be broadly classified in a number of different ways. However, one of the most common classifications is according to the methods used to identify possible drug compounds and later optimize those leads. The first broad category is informatics or artificial intelligence based approaches. In these approaches, artificial intelligence methods such as neural networks, support vector machines, and qualitative structure-activity relationships (QSAR) are used to identify chemical or structural properties that contribute heavily to binding affinity. The next category, ligand based approaches, is very useful when there are a large number of known binders for a specific family of proteins. In this approach, the ligands are clustered using a metric of chemical similarity and new compounds which occupy a similar chemical space are likely to also bind strongly with the protein of interest. The final class of methods of computational drug design, and the method explored in this thesis, is the diverse class known as structural methods. These approaches in the most general sense make use of a sampling method to sample a number of protein, or protein-small-molecule interaction conformations and an energy model or scoring function to measure dimensions which would be very difficult and or expensive to measure experimentally. In this thesis, a number of different sampling methods that are applicable to different questions in computational biology are presented. Additionally, an improved algorithm for evaluating implicit solvent effects is presented, and a number of improvements in performance, reliability and utility of the molecular mechanics program used are discussed

Columbia University Academic Commons

Ion mobility spectrometry-mass spectrometry (IMS-MS) of small molecules: separating and assigning structures to ions

Author: Agbonkonkon
Aksenov
Albritton
Alex
Allen
Alonso
Arthur
Asbury
Asbury
Asbury
Aston
Baim
Baker
Barnes
Bastug
Baumketner
Baykut
Beegle
Belov
Bluhm
Bohrer
Budimir
Buryakov
Campuzano
Canterbury
Champarnaud
Clowers
Clowers
Clowers
Collins
Covey
Coy
Creaser
Creaser
Cui
Cuyckens
D'Agostino
de la Mora
de Souza Pessôa
Dear
Douglas
Dussy
Dwivedi
Dwivedi
Dwivedi
Eatherton
Eiceman
Eiceman
Eiceman
Enders
Fenn
Fernandez-Lima
Fernández-Maestre
Fernández-Maestre
Foloppe
Gehrke
Gerlich
Giles
Giles
Griffin
Guevremont
Guo
Guo
Hariharan
Harry
Harry
Harvey
Hatsis
Heck
Henderson
Hill
Hoaglund-Hyzer
Hogan
Hogg
Holmes
Homans
Howdle
Howdle
Huang
Ibrahim
Jackson
Jafari
Jafari
Jarrold
Javahery
Jurneczko
Kanu
Kaplan
Karasek
Karimi
Karpas
Karpas
Karpas
Karpas
Karpas
Karpas
Karpas
Kebarle
Keller
Kemper
Kinnear
Knapman
Knutson
Koeniger
Kolakowski
Krylov
Krylov
Krylov
Krylov
Kurulugama
Kwasnik
Langevin
Lawrence
Levin
Levin
Li
Likar
Lipinski
Liu
Liu
Maitra
Manard
Mao
Martínez-Lozano
Martínez-Lozano
Mason
Matz
Matz
McAfee
Mie
Milloy
Momoh
Myung
Nazarov
O'Donnell
Pace
Poornima
Poully
Prieto
Pringle
Pris
Purves
Rand
Roentgen
Ruotolo
Ruotolo
Ruotolo
Rus
Rutherford
Révész
Santos
Schneider
Schneider
Schneider
Schultz
Shelimov
Shelimov
Shvartsburg
Shvartsburg
Shvartsburg
Shvartsburg
Shvartsburg
Shvartsburg
Shvartsburg
Shvartsburg
Shvartsburg
Siems
Siu
Smith
Smith
Snyder
Snyder
Steiner
Stolzenburg
Tabrizchi
Tang
Tang
Taraszka
Tarver
Thalassinos
Thomson
Thomson
Tuovinen
Turner
Ude
Uetrecht
Valentine
Valentine
Vautz
von Helden
Vonderach
Wang
Wang
Wessel
Wessel
Weston
Weston
Williams
Williams
Williams
Wren
Wright
Wu
Wyttenbach
Wyttenbach
Zhou
Zhu
Zolotov
Publication venue: 'Wiley'
Publication date: 01/01/2013
Field of study

The phenomenon of ion mobility (IM), the movement/transport of charged particles under the influence of an electric field, was first observed in the early 20th Century and harnessed later in ion mobility spectrometry (IMS). There have been rapid advances in instrumental design, experimental methods, and theory together with contributions from computational chemistry and gas-phase ion chemistry, which have diversified the range of potential applications of contemporary IMS techniques. Whilst IMS-mass spectrometry (IMS-MS) has recently been recognized for having significant research/applied industrial potential and encompasses multi-/cross-disciplinary areas of science, the applications and impact from decades of research are only now beginning to be utilized for "small molecule" species. This review focuses on the application of IMS-MS to "small molecule" species typically used in drug discovery (100-500 Da) including an assessment of the limitations and possibilities of the technique. Potential future developments in instrumental design, experimental methods, and applications are addressed. The typical application of IMS-MS in relation to small molecules has been to separate species in fairly uniform molecular classes such as mixture analysis, including metabolites. Separation of similar species has historically been challenging using IMS as the resolving power, R, has been low (3-100) and the differences in collision cross-sections that could be measured have been relatively small, so instrument and method development has often focused on increasing resolving power. However, IMS-MS has a range of other potential applications that are examined in this review where it displays unique advantages, including: determination of small molecule structure from drift time, "small molecule" separation in achiral and chiral mixtures, improvement in selectivity, identification of carbohydrate isomers, metabonomics, and for understanding the size and shape of small molecules. This review provides a broad but selective overview of current literature, concentrating on IMS-MS, not solely IMS, and small molecule applications. © 2012 Wiley Periodicals, Inc

Crossref

Greenwich Academic Literature Archive

Dynamic-Backbone Protein-Ligand Structure Prediction with Multiscale Generative Diffusion Models

Author: Anandkumar Anima
Miller III Thomas F.
Nie Weili
Qiao Zhuoran
Vahdat Arash
Publication venue
Publication date: 29/09/2022
Field of study

Molecular complexes formed by proteins and small-molecule ligands are ubiquitous, and predicting their 3D structures can facilitate both biological discoveries and the design of novel enzymes or drug molecules. Here we propose NeuralPLexer, a deep generative model framework to rapidly predict protein-ligand complex structures and their fluctuations using protein backbone template and molecular graph inputs. NeuralPLexer jointly samples protein and small-molecule 3D coordinates at an atomistic resolution through a generative model that incorporates biophysical constraints and inferred proximity information into a time-truncated diffusion process. The reverse-time generative diffusion process is learned by a novel stereochemistry-aware equivariant graph transformer that enables efficient, concurrent gradient field prediction for all heavy atoms in the protein-ligand complex. NeuralPLexer outperforms existing physics-based and learning-based methods on benchmarking problems including fixed-backbone blind protein-ligand docking and ligand-coupled binding site repacking. Moreover, we identify preliminary evidence that NeuralPLexer enriches bound-state-like protein structures when applied to systems where protein folding landscapes are significantly altered by the presence of ligands. Our results reveal that a data-driven approach can capture the structural cooperativity among protein and small-molecule entities, showing promise for the computational identification of novel drug targets and the end-to-end differentiable design of functional small-molecules and ligand-binding proteins

arXiv.org e-Print Archive

Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

Author: Blaženović Ivana
Fiehn Oliver
Ji Jian
Kind Tobias
Publication venue: eScholarship, University of California
Publication date: 01/05/2018
Field of study

The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

eScholarship - University of California

The benefits of in silico modeling to identify possible small-molecule drugs and their off-target interactions

Author: Blomberg N
Choi SH
Hastings J
Hirschey J
Mire Zloh
Stewart B Kirton
Wang X
Publication venue: 'Future Science Ltd'
Publication date: 30/01/2019
Field of study

Accepted for publication in a future issue of Future Medicinal Chemistry.The research into the use of small molecules as drugs continues to be a key driver in the development of molecular databases, computer-aided drug design software and collaborative platforms. The evolution of computational approaches is driven by the essential criteria that a drug molecule has to fulfill, from the affinity to targets to minimal side effects while having adequate absorption, distribution, metabolism, and excretion (ADME) properties. A combination of ligand- and structure-based drug development approaches is already used to obtain consensus predictions of small molecule activities and their off-target interactions. Further integration of these methods into easy-to-use workflows informed by systems biology could realize the full potential of available data in the drug discovery and reduce the attrition of drug candidates.Peer reviewe

Crossref

University of Hertfordshire Research Archive

Molecular dynamics in arbitrary geometries : parallel evaluation of pair forces

Author: Allen M.P.
Fischer K.
Gellert W.
Graham B. Macpherson
Jason M. Reese
Macpherson G.B.
Rapaport D.C.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2007
Field of study

A new algorithm for calculating intermolecular pair forces in molecular dynamics (MD) simulations on a distributed parallel computer is presented. The arbitrary interacting cells algorithm (AICA) is designed to operate on geometrical domains defined by an unstructured, arbitrary polyhedral mesh that has been spatially decomposed into irregular portions for parallelisation. It is intended for nano scale fluid mechanics simulation by MD in complex geometries, and to provide the MD component of a hybrid MD/continuum simulation. The spatial relationship of the cells of the mesh is calculated at the start of the simulation and only the molecules contained in cells that have part of their surface closer than the cut-off radius of the intermolecular pair potential are required to interact. AICA has been implemented in the open source C++ code OpenFOAM, and its accuracy has been indirectly verified against a published MD code. The same system simulated in serial and in parallel on 12 and 32 processors gives the same results. Performance tests show that there is an optimal number of cells in a mesh for maximum speed of calculating intermolecular forces, and that having a large number of empty cells in the mesh does not add a significant computational overhead

Crossref

University of Strathclyde Institutional Repository

Edinburgh Research Explorer