35 research outputs found

    The great descriptor melting pot: mixing descriptors for the common good of QSAR models

    No full text

    LeadOp+R: Structure-Based Lead Optimization With Synthetic Accessibility

    No full text
    We previously described a structure-based fragment hopping for lead optimization using a pre-docked fragment database, “LeadOp,” that conceptually replaced “bad” fragments of a ligand with “good” fragments while leaving the core of the ligand intact thus improving the compound's activity. LeadOp was proven to optimize the query molecules and systematically developed improved analogs for each of our example systems. However, even with the fragment-based design from common building blocks, it is still a challenge for synthesis. In this work, “LeadOp+R” was developed based on 198 classical chemical reactions to consider the synthetic accessibility while optimizing leads. LeadOp+R first allows user to identify a preserved space defined by the volume occupied by a fragment of the query molecule to be preserved. Then LeadOp+R searches for building blocks with the same preserved space as initial reactants and grows molecules toward the preferred receptor-ligand interactions according to reaction rules from reaction database in LeadOp+R. Multiple conformers of each intermediate product were considered and evaluated at each step. The conformer with the best group efficiency score would be selected as the initial conformer of the next building block until the program finished optimization for all selected receptor-ligand interactions. The LeadOp+R method was tested with two biomolecular systems: Tie-2 kinase and human 5-lipoxygenase. The LeadOp+R methodology was able to optimize the query molecules and systematically developed improved analogs for each of our example systems. The suggested synthetic routes for compounds proposed by LeadOp+R were the same as the published synthetic routes devised by the synthetic/organic chemists

    G.A.M.E.: GPU-Accelerated Mixture Elucidator

    No full text
    This release archives the Python script and datasets of G.A.M.E. G.A.M.E. is a dynamics programming-based algorithm equipped with GPU-acceleration feature, allowing up to 5 decimal digits for input mass data

    Ion Trace Detection Algorithm to Extract Pure Ion Chromatograms to Improve Untargeted Peak Detection Quality for Liquid Chromatography/Time-of-Flight Mass Spectrometry-Based Metabolomics Data

    No full text
    Able to detect known and unknown metabolites, untargeted metabolomics has shown great potential in identifying novel biomarkers. However, elucidating all possible liquid chromatography/time-of-flight mass spectrometry (LC/TOF-MS) ion signals in a complex biological sample remains challenging since many ions are not the products of metabolites. Methods of reducing ions not related to metabolites or simply directly detecting metabolite related (pure) ions are important. In this work, we describe <i>PITracer</i>, a novel algorithm that accurately detects the pure ions of a LC/TOF-MS profile to extract pure ion chromatograms and detect chromatographic peaks. <i>PITracer</i> estimates the relative mass difference tolerance of ions and calibrates the mass over charge (<i>m/z</i>) values for peak detection algorithms with an additional option to further mass correction with respect to a user-specified metabolite. <i>PITracer</i> was evaluated using two data sets containing 373 human metabolite standards, including 5 saturated standards considered to be split peaks resultant from huge <i>m</i>/<i>z</i> fluctuation, and 12 urine samples spiked with 50 forensic drugs of varying concentrations. Analysis of these data sets show that <i>PITracer</i> correctly outperformed existing state-of-art algorithm and extracted the pure ion chromatograms of the 5 saturated standards without generating split peaks and detected the forensic drugs with high recall, precision, and <i>F</i>-score and small mass error

    Template-Based de Novo Design for Type II Kinase Inhibitors and Its Extented Application to Acetylcholinesterase Inhibitors

    No full text
    There is a compelling need to discover type II inhibitors targeting the unique DFG-out inactive kinase conformation since they are likely to possess greater potency and selectivity relative to traditional type I inhibitors. Using a known inhibitor, such as a currently available and approved drug or inhibitor, as a template to design new drugs via computational de novo design is helpful when working with known ligand-receptor interactions. This study proposes a new template-based de novo design protocol to discover new inhibitors that preserve and also optimize the binding interactions of the type II kinase template. First, sorafenib (Nexavar®) and nilotinib (Tasigna®), two type II inhibitors with different ligand-receptor interactions, were selected as the template compounds. The five-step protocol can reassemble each drug from a large fragment library. Our procedure demonstrates that the selected template compounds can be successfully reassembled while the key ligand-receptor interactions are preserved. Furthermore, to demonstrate that the algorithm is able to construct more potent compounds, we considered kinase inhibitors and other protein dataset, acetylcholinesterase (AChE) inhibitors. The de novo optimization was initiated using a template compound possessing a less than optimal activity from a series of aminoisoquinoline and TAK-285 inhibiting type II kinases, and E2020 derivatives inhibiting AChE respectively. Three compounds with greater potency than the template compound were discovered that were also included in the original congeneric series. This template-based lead optimization protocol with the fragment library can help to design compounds with preferred binding interactions of known inhibitors automatically and further optimize the compounds in the binding pockets

    Template-Based de Novo Design for Type II Kinase Inhibitors and Its Extented Application to Acetylcholinesterase Inhibitors

    No full text
    There is a compelling need to discover type II inhibitors targeting the unique DFG-out inactive kinase conformation since they are likely to possess greater potency and selectivity relative to traditional type I inhibitors. Using a known inhibitor, such as a currently available and approved drug or inhibitor, as a template to design new drugs via computational de novo design is helpful when working with known ligand-receptor interactions. This study proposes a new template-based de novo design protocol to discover new inhibitors that preserve and also optimize the binding interactions of the type II kinase template. First, sorafenib (Nexavar®) and nilotinib (Tasigna®), two type II inhibitors with different ligand-receptor interactions, were selected as the template compounds. The five-step protocol can reassemble each drug from a large fragment library. Our procedure demonstrates that the selected template compounds can be successfully reassembled while the key ligand-receptor interactions are preserved. Furthermore, to demonstrate that the algorithm is able to construct more potent compounds, we considered kinase inhibitors and other protein dataset, acetylcholinesterase (AChE) inhibitors. The de novo optimization was initiated using a template compound possessing a less than optimal activity from a series of aminoisoquinoline and TAK-285 inhibiting type II kinases, and E2020 derivatives inhibiting AChE respectively. Three compounds with greater potency than the template compound were discovered that were also included in the original congeneric series. This template-based lead optimization protocol with the fragment library can help to design compounds with preferred binding interactions of known inhibitors automatically and further optimize the compounds in the binding pockets

    Oversampling to Overcome Overfitting: Exploring the Relationship between Data Set Composition, Molecular Descriptors, and Predictive Modeling Methods

    No full text
    The traditional biological assay is very time-consuming, and thus the ability to quickly screen large numbers of compounds against a specific biological target is appealing. To speed up the biological evaluation of compounds, high-throughput screening is widely used in the fields of biomedical, biological information, and drug discovery. The research presented in this study focuses on the use of support vector machines, a machine learning method, various classes of molecular descriptors, and different sampling techniques to overcome overfitting to classify compounds for cytotoxicity with respect to the Jurkat cell line. The cell cytotoxicity data set is imbalanced (a few active compounds and very many inactive compounds), and the ability of the predictive modeling methods is adversely affected in these situations. Commonly imbalanced data sets are overfit with respect to the dominant classified end point; in this study the models routinely overfit toward inactive (noncytotoxic) compounds when the imbalance was substantial. Support vector machine (SVM) models were used to probe the proficiency of different classes of molecular descriptors and oversampling ratios. The SVM models were constructed from 4D-FPs, MOE (1D, 2D, and 21/2D), noNP+MOE, and CATS2D trial descriptors pools and compared to the predictive abilities of CATS2D-based random forest models. Compared to previous results in the literature, the SVM models built from oversampled data sets exhibited better predictive abilities for the training and external test sets
    corecore