90 research outputs found
Evaluation of a Bayesian inference network for ligand-based virtual screening
Background
Bayesian inference networks enable the computation of the probability that an event will occur. They have been used previously to rank textual documents in order of decreasing relevance to a user-defined query. Here, we modify the approach to enable a Bayesian inference network to be used for chemical similarity searching, where a database is ranked in order of decreasing probability of bioactivity.
Results
Bayesian inference networks were implemented using two different types of network and four different types of belief function. Experiments with the MDDR and WOMBAT databases show that a Bayesian inference network can be used to provide effective ligand-based screening, especially when the active molecules being sought have a high degree of structural homogeneity; in such cases, the network substantially out-performs a conventional, Tanimoto-based similarity searching system. However, the effectiveness of the network is much less when structurally heterogeneous sets of actives are being sought.
Conclusion
A Bayesian inference network provides an interesting alternative to existing tools for ligand-based virtual screening
Structural diversity of biologically interesting datasets: a scaffold analysis approach
ABSTRACT:The recent public availability of the human metabolome and natural product datasets has revitalized "metabolite-likeness" and "natural product-likeness" as a drug design concept to design lead libraries targeting specific pathways. Many reports have analyzed the physicochemical property space of biologically important datasets, with only a few comprehensively characterizing the scaffold diversity in public datasets of biological interest. With large collections of high quality public data currently available, we carried out a comparative analysis of current day leads with other biologically relevant datasets.In this study, we note a two-fold enrichment of metabolite scaffolds in drug dataset (42%) as compared to currently used lead libraries (23%). We also note that only a small percentage (5%) of natural product scaffolds space is shared by the lead dataset. We have identified specific scaffolds that are present in metabolites and natural products, with close counterparts in the drugs, but are missing in the lead dataset. To determine the distribution of compounds in physicochemical property space we analyzed the molecular polar surface area, the molecular solubility, the number of rings and the number of rotatable bonds in addition to four well-known Lipinski properties. Here, we note that, with only few exceptions, most of the drugs follow Lipinski's rule. The average values of the molecular polar surface area and the molecular solubility in metabolites is the highest while the number of rings is the lowest. In addition, we note that natural products contain the maximum number of rings and the rotatable bonds than any other dataset under consideration.Currently used lead libraries make little use of the metabolites and natural products scaffold space. We believe that metabolites and natural products are recognized by at least one protein in the biosphere therefore, sampling the fragment and scaffold space of these compounds, along with the knowledge of distribution in physicochemical property space, can result in better lead libraries. Hence, we recommend the greater use of metabolites and natural products while designing lead libraries. Nevertheless, metabolites have a limited distribution in chemical space that limits the usage of metabolites in library design.14 page(s
Semantic Similarity for Automatic Classification of Chemical Compounds
With the increasing amount of data made available in the chemical field, there is a strong need for systems capable of comparing and classifying chemical compounds in an efficient and effective way. The best approaches existing today are based on the structure-activity relationship premise, which states that biological activity of a molecule is strongly related to its structural or physicochemical properties. This work presents a novel approach to the automatic classification of chemical compounds by integrating semantic similarity with existing structural comparison methods. Our approach was assessed based on the Matthews Correlation Coefficient for the prediction, and achieved values of 0.810 when used as a prediction of blood-brain barrier permeability, 0.694 for P-glycoprotein substrate, and 0.673 for estrogen receptor binding activity. These results expose a significant improvement over the currently existing methods, whose best performances were 0.628, 0.591, and 0.647 respectively. It was demonstrated that the integration of semantic similarity is a feasible and effective way to improve existing chemical compound classification systems. Among other possible uses, this tool helps the study of the evolution of metabolic pathways, the study of the correlation of metabolic networks with properties of those networks, or the improvement of ontologies that represent chemical information
Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds
<p>Abstract</p> <p>Background</p> <p>Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.</p> <p>Results</p> <p>From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical <it>in vitro </it>binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis.</p> <p>Conclusions</p> <p>The compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability.</p
Spatial chemical distance based on atomic property fields
Similarity of compound chemical structures often leads to close pharmacological profiles, including binding to the same protein targets. The opposite, however, is not always true, as distinct chemical scaffolds can exhibit similar pharmacology as well. Therefore, relying on chemical similarity to known binders in search for novel chemicals targeting the same protein artificially narrows down the results and makes lead hopping impossible. In this study we attempt to design a compound similarity/distance measure that better captures structural aspects of their pharmacology and molecular interactions. The measure is based on our recently published method for compound spatial alignment with atomic property fields as a generalized 3D pharmacophoric potential. We optimized contributions of different atomic properties for better discrimination of compound pairs with the same pharmacology from those with different pharmacology using Partial Least Squares regression. Our proposed similarity measure was then tested for its ability to discriminate pharmacologically similar pairs from decoys on a large diverse dataset of 115 protein–ligand complexes. Compared to 2D Tanimoto and Shape Tanimoto approaches, our new approach led to improvement in the area under the receiver operating characteristic curve values in 66 and 58% of domains respectively. The improvement was particularly high for the previously problematic cases (weak performance of the 2D Tanimoto and Shape Tanimoto measures) with original AUC values below 0.8. In fact for these cases we obtained improvement in 86% of domains compare to 2D Tanimoto measure and 85% compare to Shape Tanimoto measure. The proposed spatial chemical distance measure can be used in virtual ligand screening
In silico approach to screen compounds active against parasitic nematodes of major socio-economic importance
Infections due to parasitic nematodes are common causes of morbidity and fatality around the world especially in developing nations. At present however, there are only three major classes of drugs for treating human nematode infections. Additionally the scientific knowledge on the mechanism of action and the reason for the resistance to these drugs is poorly understood. Commercial incentives to design drugs that are endemic to developing countries are limited therefore, virtual screening in academic settings can play a vital role is discovering novel drugs useful against neglected diseases. In this study we propose to build robust machine learning model to classify and screen compounds active against parasitic nematodes.A set of compounds active against parasitic nematodes were collated from various literature sources including PubChem while the inactive set was derived from DrugBank database. The support vector machine (SVM) algorithm was used for model development, and stratified ten-fold cross validation was used to evaluate the performance of each classifier. The best results were obtained using the radial basis function kernel. The SVM method achieved an accuracy of 81.79% on an independent test set. Using the model developed above, we were able to indentify novel compounds with potential anthelmintic activity.In this study, we successfully present the SVM approach for predicting compounds active against parasitic nematodes which suggests the effectiveness of computational approaches for antiparasitic drug discovery. Although, the accuracy obtained is lower than the previously reported in a similar study but we believe that our model is more robust because we intentionally employed stringent criteria to select inactive dataset thus making it difficult for the model to classify compounds. The method presents an alternative approach to the existing traditional methods and may be useful for predicting hitherto novel anthelmintic compounds.12 page(s
The Impact of Oxygen on Metabolic Evolution: A Chemoinformatic Investigation
The appearance of planetary oxygen likely transformed the chemical and biochemical makeup of life and probably triggered episodes of organismal diversification. Here we use chemoinformatic methods to explore the impact of the rise of oxygen on metabolic evolution. We undertake a comprehensive comparative analysis of structures, chemical properties and chemical reactions of anaerobic and aerobic metabolites. The results indicate that aerobic metabolism has expanded the structural and chemical space of metabolites considerably, including the appearance of 130 novel molecular scaffolds. The molecular functions of these metabolites are mainly associated with derived aspects of cellular life, such as signal transfer, defense against biotic factors, and protection of organisms from oxidation. Moreover, aerobic metabolites are more hydrophobic and rigid than anaerobic compounds, suggesting they are better fit to modulate membrane functions and to serve as transmembrane signaling factors. Since higher organisms depend largely on sophisticated membrane-enabled functions and intercellular signaling systems, the metabolic developments brought about by oxygen benefit the diversity of cellular makeup and the complexity of cellular organization as well. These findings enhance our understanding of the molecular link between oxygen and evolution. They also show the significance of chemoinformatics in addressing basic biological questions
Drug discovery prospect from untapped species: Indications from approved natural product drugs
10.1371/journal.pone.0039782PLoS ONE77
IDSS: deformation invariant signatures for molecular shape comparison
<p>Abstract</p> <p>Background</p> <p>Many molecules of interest are flexible and undergo significant shape deformation as part of their function, but most existing methods of molecular shape comparison (MSC) treat them as rigid bodies, which may lead to incorrect measure of the shape similarity of flexible molecules.</p> <p>Results</p> <p>To address the issue we introduce a new shape descriptor, called Inner Distance Shape Signature (IDSS), for describing the 3D shapes of flexible molecules. The inner distance is defined as the length of the shortest path between landmark points within the molecular shape, and it reflects well the molecular structure and deformation without explicit decomposition. Our IDSS is stored as a histogram which is a probability distribution of inner distances between all sample point pairs on the molecular surface. We show that IDSS is insensitive to shape deformation of flexible molecules and more effective at capturing molecular structures than traditional shape descriptors. Our approach reduces the 3D shape comparison problem of flexible molecules to the comparison of IDSS histograms.</p> <p>Conclusion</p> <p>The proposed algorithm is robust and does not require any prior knowledge of the flexible regions. We demonstrate the effectiveness of IDSS within a molecular search engine application for a benchmark containing abundant conformational changes of molecules. Such comparisons in several thousands per second can be carried out. The presented IDSS method can be considered as an alternative and complementary tool for the existing methods for rigid MSC. The binary executable program for Windows platform and database are available from <url>https://engineering.purdue.edu/PRECISE/IDSS</url>.</p
An Expanded Set of Amino Acid Analogs for the Ribosomal Translation of Unnatural Peptides
BACKGROUND: The application of in vitro translation to the synthesis of unnatural peptides may allow the production of extremely large libraries of highly modified peptides, which are a potential source of lead compounds in the search for new pharmaceutical agents. The specificity of the translation apparatus, however, limits the diversity of unnatural amino acids that can be incorporated into peptides by ribosomal translation. We have previously shown that over 90 unnatural amino acids can be enzymatically loaded onto tRNA. METHODOLOGY/PRINCIPAL FINDINGS: We have now used a competition assay to assess the efficiency of tRNA-aminoacylation of these analogs. We have also used a series of peptide translation assays to measure the efficiency with which these analogs are incorporated into peptides. The translation apparatus tolerates most side chain derivatives, a few alpha,alpha disubstituted, N-methyl and alpha-hydroxy derivatives, but no beta-amino acids. We show that over 50 unnatural amino acids can be incorporated into peptides by ribosomal translation. Using a set of analogs that are efficiently charged and translated we were able to prepare individual peptides containing up to 13 different unnatural amino acids. CONCLUSIONS/SIGNIFICANCE: Our results demonstrate that a diverse array of unnatural building blocks can be translationally incorporated into peptides. These building blocks provide new opportunities for in vitro selections with highly modified drug-like peptides
- …