40,201 research outputs found

    Machine Learning for Small Molecule Identification

    Get PDF
    Metabolites are small molecules involved in biological process of organisms. For example, ethylene serves as plants hormone to stimulate or regulate the opening of flowers, ripening of fruit and shedding of leaves. Metabolite identification is to figure out the molecular structure of the metabo-lite contained in some biological sample, which is considered as a major bottleneck for metabolo-mics. The backbone analytical technology for metabolite identification is tandem mass spectrometry. It consists two rounds of mass spectrometry: In the first round all the metabolites in a sample are measured and one particular metabolite being interested is selected and fragmented by a process of dissociation. In the second round, the fragments as well as their abundance are measured. The resulting tandem mass spectra contain the information on the structure and composition of the molecules. This thesis aims to solve the problem of identifying the molecular structures that produce the observed tandem mass spectra from some biological sample. The traditional methods are mostly based on matching the observed tandem mass spectra to the reference spectra in some database. However, these methods could fail if there are no reference spectra for the molecules in the underlying sample, which is not uncommon especially considering only 220,000 spectra representing 20,000 molecules are measured and annotated according to a recent study while the number of molecules recorded in a compound database PubChem is more than 60 million. To alleviate this problem, many recent works has been focusing on the approach so called in silico fragmentation where the fragmentations are first simulated in computer for the molecules in some molecular database. Then the simulated fragments are compared to the measured tandem mass spectra. The main contribution of this thesis is to open a novel direction to bridge the gap between the limited spectral database and the vast molecular database with the help of molecular fingerprints. Molecular fingerprints are a binary representation to encode the structures or properties of a molecule. Kernel based machine learning methods are used to predict the molecular fingerprints from tandem mass spectra. Then the predicted fingerprints are used to match the fingerprints of mole-cules in some molecular database to derive an identification. Multiple kernel learning are also proposed to combine different views of tandem mass spectra. Finally, a one-step approach based on input output kernel regression is also applied to solve this problem, which becomes the new state of the art as demonstrated in several benchmarks including the recent Critical Assessment of Small Molecule Identification (CASMI) 2016 challenge

    Phenotypic Screening Combined with Machine Learning for Efficient Identification of Breast Cancer-Selective Therapeutic Targets

    Get PDF
    The lack of functional understanding of most mutations in cancer, combined with the non-druggability of most proteins, challenge genomics-based identification of oncology drug targets. We implemented a machine-learning-based approach (idTRAX), which relates cell-based screening of small-molecule compounds to their kinase inhibition data, to directly identify effective and readily druggable targets. We applied idTRAX to triple-negative breast cancer cell lines and efficiently identified cancer-selective targets. For example, we found that inhibiting AKT selectively kills MFM-223 and CAL148 cells, while inhibiting FGFR2 only kills MFM-223. Since the effects of catalytically inhibiting a protein can diverge from those of reducing its levels, targets identified by idTRAX frequently differ from those identified through gene knockout/knockdown methods. This is critical if the purpose is to identify targets specifically for small-molecule drug development, whereby idTRAX may produce fewer false-positives. The rapid nature of the approach suggests that it may be applicable in personalizing therapy.Peer reviewe

    Collision Cross Section Prediction with Molecular Fingerprint Using Machine Learning

    Get PDF
    High-resolution mass spectrometry is a promising technique in non-target screening (NTS) to monitor contaminants of emerging concern in complex samples. Current chemical identification strategies in NTS experiments typically depend on spectral libraries, chemical databases, and in silico fragmentation tools. However, small molecule identification remains challenging due to the lack of orthogonal sources of information (e.g., unique fragments). Collision cross section (CCS) values measured by ion mobility spectrometry (IMS) offer an additional identification dimension to increase the confidence level. Thanks to the advances in analytical instrumentation, an increasing application of IMS hybrid with high-resolution mass spectrometry (HRMS) in NTS has been reported in the recent decades. Several CCS prediction tools have been developed. However, limited CCS prediction methods were based on a large scale of chemical classes and cross-platform CCS measurements. We successfully developed two prediction models using a random forest machine learning algorithm. One of the approaches was based on chemicals’ super classes; the other model was direct CCS prediction using molecular fingerprint. Over 13,324 CCS values from six different laboratories and PubChem using a variety of ion-mobility separation techniques were used for training and testing the models. The test accuracy for all the prediction models was over 0.85, and the median of relative residual was around 2.2%. The models can be applied to different IMS platforms to eliminate false positives in small molecule identification

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    The benefits of in silico modeling to identify possible small-molecule drugs and their off-target interactions

    Get PDF
    Accepted for publication in a future issue of Future Medicinal Chemistry.The research into the use of small molecules as drugs continues to be a key driver in the development of molecular databases, computer-aided drug design software and collaborative platforms. The evolution of computational approaches is driven by the essential criteria that a drug molecule has to fulfill, from the affinity to targets to minimal side effects while having adequate absorption, distribution, metabolism, and excretion (ADME) properties. A combination of ligand- and structure-based drug development approaches is already used to obtain consensus predictions of small molecule activities and their off-target interactions. Further integration of these methods into easy-to-use workflows informed by systems biology could realize the full potential of available data in the drug discovery and reduce the attrition of drug candidates.Peer reviewe
    • …
    corecore