349 research outputs found

    Study of ligand-based virtual screening tools in computer-aided drug design

    Get PDF
    Virtual screening is a central technique in drug discovery today. Millions of molecules can be tested in silico with the aim to only select the most promising and test them experimentally. The topic of this thesis is ligand-based virtual screening tools which take existing active molecules as starting point for finding new drug candidates. One goal of this thesis was to build a model that gives the probability that two molecules are biologically similar as function of one or more chemical similarity scores. Another important goal was to evaluate how well different ligand-based virtual screening tools are able to distinguish active molecules from inactives. One more criterion set for the virtual screening tools was their applicability in scaffold-hopping, i.e. finding new active chemotypes. In the first part of the work, a link was defined between the abstract chemical similarity score given by a screening tool and the probability that the two molecules are biologically similar. These results help to decide objectively which virtual screening hits to test experimentally. The work also resulted in a new type of data fusion method when using two or more tools. In the second part, five ligand-based virtual screening tools were evaluated and their performance was found to be generally poor. Three reasons for this were proposed: false negatives in the benchmark sets, active molecules that do not share the binding mode, and activity cliffs. In the third part of the study, a novel visualization and quantification method is presented for evaluation of the scaffold-hopping ability of virtual screening tools.Siirretty Doriast

    Identification of Novel Antimalarial Chemotypes via Chemoinformatic Compound Selection Methods for a High-Throughput Screening Program against the Novel Malarial Target, PfNDH2: Increasing Hit Rate via Virtual Screening Methods

    Get PDF
    Malaria is responsible for approximately 1 million deaths annually; thus, continued efforts to discover new antimalarials are required. A HTS screen was established to identify novel inhibitors of the parasite's mitochondrial enzyme NADH:quinone oxidoreductase (PfNDH2). On the basis of only one known inhibitor of this enzyme, the challenge was to discover novel inhibitors of PfNDH2 with diverse chemical scaffolds. To this end, using a range of ligand-based chemoinformatics methods, ~17000 compounds were selected from a commercial library of ~750000 compounds. Forty-eight compounds were identified with PfNDH2 enzyme inhibition IC(50) values ranging from 100 nM to 40 μM and also displayed exciting whole cell antimalarial activity. These novel inhibitors were identified through sampling 16% of the available chemical space, while only screening 2% of the library. This study confirms the added value of using multiple ligand-based chemoinformatic approaches and has successfully identified novel distinct chemotypes primed for development as new agents against malaria

    Optimal assignment methods for ligand-based virtual screening

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ligand-based virtual screening experiments are an important task in the early drug discovery stage. An ambitious aim in each experiment is to disclose active structures based on new scaffolds. To perform these "scaffold-hoppings" for individual problems and targets, a plethora of different similarity methods based on diverse techniques were published in the last years. The optimal assignment approach on molecular graphs, a successful method in the field of quantitative structure-activity relationships, has not been tested as a ligand-based virtual screening method so far.</p> <p>Results</p> <p>We evaluated two already published and two new optimal assignment methods on various data sets. To emphasize the "scaffold-hopping" ability, we used the information of chemotype clustering analyses in our evaluation metrics. Comparisons with literature results show an improved early recognition performance and comparable results over the complete data set. A new method based on two different assignment steps shows an increased "scaffold-hopping" behavior together with a good early recognition performance.</p> <p>Conclusion</p> <p>The presented methods show a good combination of chemotype discovery and enrichment of active structures. Additionally, the optimal assignment on molecular graphs has the advantage to investigate and interpret the mappings, allowing precise modifications of internal parameters of the similarity measure for specific targets. All methods have low computation times which make them applicable to screen large data sets.</p

    Shaping a screening file for maximal lead discovery efficiency and effectiveness: elimination of molecular redundancy

    Get PDF
    High Throughput Screening (HTS) is a successful strategy for finding hits and leads that have the opportunity to be converted into drugs. In this paper we highlight novel computational methods used to select compounds to build a new screening file at Pfizer and the analytical methods we used to assess their quality. We also introduce the novel concept of molecular redundancy to help decide on the density of compounds required in any region of chemical space in order to be confident of running successful HTS campaigns

    New similarity measures for ligand-based virtual screening

    Get PDF
    The process of drug discovery using virtual screening techniques relies on “molecular similarity principle” which states that structurally similar molecules tend to have similar physicochemical and biological properties in comparison to other dissimilar molecules. Most of the existing virtual screening methods use similarity measures such as the standard Tanimoto coefficient. However, these conventional similarity measures are inadequate, and their results are not satisfactory to researchers. This research investigated new similarity measures. It developed a novel similarity measure and molecules ranking method to retrieve molecules more efficiently. Firstly, a new similarity measure was derived from existing similarity measures, besides focusing on preferred similarity concepts. Secondly, new similarity measures were developed by reweighting some bit-strings, where features present in the compared molecules, and features not present in both compared molecules were given strong consideration. The final approach investigated ranking methods to develop a substitutional ranking method. The study compared the similarity measures and ranking methods with benchmark coefficients such as Tanimoto, Cosine, Dice, and Simple Matching (SM). The approaches were tested using standard data sets such as MDL Drug Data Report (MDDR), Directory of Useful Decoys (DUD) and Maximum Unbiased Validation (MUV). The overall results of this research showed that the new similarity measures and ranking methods outperformed the conventional industry- standard Tanimoto-based similarity search approach. The similarity measures are thus likely to support lead optimization and lead identification process better than methods based on Tanimoto coefficients

    Large scale study of multiple-molecule queries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In ligand-based screening, as well as in other chemoinformatics applications, one seeks to effectively search large repositories of molecules in order to retrieve molecules that are similar typically to a single molecule lead. However, in some case, multiple molecules from the same family are available to seed the query and search for other members of the same family.</p> <p>Multiple-molecule query methods have been less studied than single-molecule query methods. Furthermore, the previous studies have relied on proprietary data and sometimes have not used proper cross-validation methods to assess the results. In contrast, here we develop and compare multiple-molecule query methods using several large publicly available data sets and background. We also create a framework based on a strict cross-validation protocol to allow unbiased benchmarking for direct comparison in future studies across several performance metrics.</p> <p>Results</p> <p>Fourteen different multiple-molecule query methods were defined and benchmarked using: (1) 41 publicly available data sets of related molecules with similar biological activity; and (2) publicly available background data sets consisting of up to 175,000 molecules randomly extracted from the ChemDB database and other sources. Eight of the fourteen methods were parameter free, and six of them fit one or two free parameters to the data using a careful cross-validation protocol. All the methods were assessed and compared for their ability to retrieve members of the same family against the background data set by using several performance metrics including the Area Under the Accumulation Curve (AUAC), Area Under the Curve (AUC), F1-measure, and BEDROC metrics.</p> <p>Consistent with the previous literature, the best parameter-free methods are the MAX-SIM and MIN-RANK methods, which score a molecule to a family by the maximum similarity, or minimum ranking, obtained across the family. One new parameterized method introduced in this study and two previously defined methods, the Exponential Tanimoto Discriminant (ETD), the Tanimoto Power Discriminant (TPD), and the Binary Kernel Discriminant (<b>BKD</b>), outperform most other methods but are more complex, requiring one or two parameters to be fit to the data.</p> <p>Conclusion</p> <p>Fourteen methods for multiple-molecule querying of chemical databases, including novel methods, (ETD) and (TPD), are validated using publicly available data sets, standard cross-validation protocols, and established metrics. The best results are obtained with ETD, TPD, BKD, MAX-SIM, and MIN-RANK. These results can be replicated and compared with the results of future studies using data freely downloadable from <url>http://cdb.ics.uci.edu/</url>.</p

    CACHE (Critical Assessment of Computational Hit-finding Experiments): A public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding

    Get PDF
    One aspirational goal of computational chemistry is to predict potent and drug-like binders for any protein, such that only those that bind are synthesized. In this Roadmap, we describe the launch of Critical Assessment of Computational Hit-finding Experiments (CACHE), a public benchmarking project to compare and improve small-molecule hit-finding algorithms through cycles of prediction and experimental testing. Participants will predict small-molecule binders for new and biologically relevant protein targets representing different prediction scenarios. Predicted compounds will be tested rigorously in an experimental hub, and all predicted binders as well as all experimental screening data, including the chemical structures of experimentally tested compounds, will be made publicly available and not subject to any intellectual property restrictions. The ability of a range of computational approaches to find novel binders will be evaluated, compared and openly published. CACHE will launch three new benchmarking exercises every year. The outcomes will be better prediction methods, new small-molecule binders for target proteins of importance for fundamental biology or drug discovery and a major technological step towards achieving the goal of Target 2035, a global initiative to identify pharmacological probes for all human proteins. [Figure not available: see fulltext.

    PubChem3D: Similar conformers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>PubChem is a free and open public resource for the biological activities of small molecules. With many tens of millions of both chemical structures and biological test results, PubChem is a sizeable system with an uneven degree of available information. Some chemical structures in PubChem include a great deal of biological annotation, while others have little to none. To help users, PubChem pre-computes "neighboring" relationships to relate similar chemical structures, which may have similar biological function. In this work, we introduce a "Similar Conformers" neighboring relationship to identify compounds with similar 3-D shape and similar 3-D orientation of functional groups typically used to define pharmacophore features.</p> <p>Results</p> <p>The first two diverse 3-D conformers of 26.1 million PubChem Compound records were compared to each other, using a shape Tanimoto (ST) of 0.8 or greater and a color Tanimoto (CT) of 0.5 or greater, yielding 8.16 billion conformer neighbor pairs and 6.62 billion compound neighbor pairs, with an average of 253 "Similar Conformers" compound neighbors per compound. Comparing the 3-D neighboring relationship to the corresponding 2-D neighboring relationship ("Similar Compounds") for molecules such as caffeine, aspirin, and morphine, one finds unique sets of related chemical structures, providing additional significant biological annotation. The PubChem 3-D neighboring relationship is also shown to be able to group a set of non-steroidal anti-inflammatory drugs (NSAIDs), despite limited PubChem 2-D similarity.</p> <p>In a study of 4,218 chemical structures of biomedical interest, consisting of many known drugs, using more diverse conformers per compound results in more 3-D compound neighbors per compound; however, the overlap of the compound neighbor lists per conformer also increasingly resemble each other, being 38% identical at three conformers and 68% at ten conformers. Perhaps surprising is that the average count of conformer neighbors per conformer increases rather slowly as a function of diverse conformers considered, with only a 70% increase for a ten times growth in conformers per compound (a 68-fold increase in the conformer pairs considered).</p> <p>Neighboring 3-D conformers on the scale performed, if implemented naively, is an intractable problem using a modest sized compute cluster. Methodology developed in this work relies on a series of filters to prevent performing 3-D superposition optimization, when it can be determined that two conformers cannot possibly be a neighbor. Most filters are based on Tanimoto equation volume constraints, avoiding incompatible conformers; however, others consider preliminary superposition between conformers using reference shapes.</p> <p>Conclusion</p> <p>The "Similar Conformers" 3-D neighboring relationship locates similar small molecules of biological interest that may go unnoticed when using traditional 2-D chemical structure graph-based methods, making it complementary to such methodologies. The computational cost of 3-D similarity methodology on a wide scale, such as PubChem contents, is a considerable issue to overcome. Using a series of efficient filters, an effective throughput rate of more than 150,000 conformers per second per processor core was achieved, more than two orders of magnitude faster than without filtering.</p
    corecore