40 research outputs found

    Structural similarity assessment for drug sensitivity prediction in cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ability to predict drug sensitivity in cancer is one of the exciting promises of pharmacogenomic research. Several groups have demonstrated the ability to predict drug sensitivity by integrating chemo-sensitivity data and associated gene expression measurements from large anti-cancer drug screens such as NCI-60. The general approach is based on comparing gene expression measurements from sensitive and resistant cancer cell lines and deriving drug sensitivity profiles consisting of lists of genes whose expression is predictive of response to a drug. Importantly, it has been shown that such profiles are generic and can be applied to cancer cell lines that are not part of the anti-cancer screen. However, one limitation is that the profiles can not be generated for untested drugs (i.e., drugs that are not part of an anti-cancer drug screen). In this work, we propose using an existing drug sensitivity profile for drug A as a substitute for an untested drug B given high structural similarities between drugs A and B.</p> <p>Results</p> <p>We first show that structural similarity between pairs of compounds in the NCI-60 dataset highly correlates with the similarity between their activities across the cancer cell lines. This result shows that structurally similar drugs can be expected to have a similar effect on cancer cell lines. We next set out to test our hypothesis that we can use existing drug sensitivity profiles as substitute profiles for untested drugs. In a cross-validation experiment, we found that the use of substitute profiles is possible without a significant loss of prediction accuracy if the substitute profile was generated from a compound with high structural similarity to the untested compound.</p> <p>Conclusion</p> <p>Anti-cancer drug screens are a valuable resource for generating omics-based drug sensitivity profiles. We show that it is possible to extend the usefulness of existing screens to untested drugs by deriving substitute sensitivity profiles from structurally similar drugs part of the screen.</p

    WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, there has been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery. However, there is a distinct lack of data mining tools available to harness this information, and in particular for knowledge discovery across multiple information sources. At Indiana University we have an ongoing project with Eli Lilly to develop web-service based tools for integrative mining of chemical and biological information. In this paper, we report on the first of these tools, called WENDI (Web Engine for Non-obvious Drug Information) that attempts to find non-obvious relationships between a query compound and scholarly publications, biological properties, genes and diseases using multiple information sources.</p> <p>Results</p> <p>We have created an aggregate web service that takes a query compound as input, calls multiple web services for computation and database search, and returns an XML file that aggregates this information. We have also developed a client application that provides an easy-to-use interface to this web service. Both the service and client are publicly available.</p> <p>Conclusions</p> <p>Initial testing indicates this tool is useful in identifying potential biological applications of compounds that are not obvious, and in identifying corroborating and conflicting information from multiple sources. We encourage feedback on the tool to help us refine it further. We are now developing further tools based on this model.</p

    Interpreting linear support vector machine models with heat map molecule coloring

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Model-based virtual screening plays an important role in the early drug discovery stage. The outcomes of high-throughput screenings are a valuable source for machine learning algorithms to infer such models. Besides a strong performance, the interpretability of a machine learning model is a desired property to guide the optimization of a compound in later drug discovery stages. Linear support vector machines showed to have a convincing performance on large-scale data sets. The goal of this study is to present a heat map molecule coloring technique to interpret linear support vector machine models. Based on the weights of a linear model, the visualization approach colors each atom and bond of a compound according to its importance for activity.</p> <p>Results</p> <p>We evaluated our approach on a toxicity data set, a chromosome aberration data set, and the maximum unbiased validation data sets. The experiments show that our method sensibly visualizes structure-property and structure-activity relationships of a linear support vector machine model. The coloring of ligands in the binding pocket of several crystal structures of a maximum unbiased validation data set target indicates that our approach assists to determine the correct ligand orientation in the binding pocket. Additionally, the heat map coloring enables the identification of substructures important for the binding of an inhibitor.</p> <p>Conclusions</p> <p>In combination with heat map coloring, linear support vector machine models can help to guide the modification of a compound in later stages of drug discovery. Particularly substructures identified as important by our method might be a starting point for optimization of a lead compound. The heat map coloring should be considered as complementary to structure based modeling approaches. As such, it helps to get a better understanding of the binding mode of an inhibitor.</p

    A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The inverse-QSAR problem seeks to find a new molecular descriptor from which one can recover the structure of a molecule that possess a desired activity or property. Surprisingly, there are very few papers providing solutions to this problem. It is a difficult problem because the molecular descriptors involved with the inverse-QSAR algorithm must adequately address the forward QSAR problem for a given biological activity if the subsequent recovery phase is to be meaningful. In addition, one should be able to construct a feasible molecule from such a descriptor. The difficulty of recovering the molecule from its descriptor is the major limitation of most inverse-QSAR methods.</p> <p>Results</p> <p>In this paper, we describe the reversibility of our previously reported descriptor, the vector space model molecular descriptor (VSMMD) based on a vector space model that is suitable for kernel studies in QSAR modeling. Our inverse-QSAR approach can be described using five steps: (1) generate the VSMMD for the compounds in the training set; (2) map the VSMMD in the input space to the kernel feature space using an appropriate kernel function; (3) design or generate a new point in the kernel feature space using a kernel feature space algorithm; (4) map the feature space point back to the input space of descriptors using a pre-image approximation algorithm; (5) build the molecular structure template using our VSMMD molecule recovery algorithm.</p> <p>Conclusion</p> <p>The empirical results reported in this paper show that our strategy of using kernel methodology for an inverse-Quantitative Structure-Activity Relationship is sufficiently powerful to find a meaningful solution for practical problems.</p

    Beyond new chemical entities

    No full text

    On the Analysis of Compressed Chemical Fingerprints

    No full text
    none1noChemical fingerprints are binary strings used to represent the distinctive features of molecules in order to efficiently support similarity search of chemical data. In large repositories, chemical fingerprints are conveniently stored in compressed format, although the lossy compression process may introduce a systematic error on similarity measures. Simple correction formulae have proposed by Swamidass and Baldi in [13] to compensate for such an error and, thus, to improve the similarity-based retrieval. Correction is based on deriving estimates for the weight (i.e., number of bits set to 1) of fingerprints before compression from their compressed values. Although the proposed correction has been substantiated by satisfactory experimental results, the way in which such estimates have been derived and the approximations applied in [13] are not fully convincing and, thus, deserve further investigation. In this direction, the contribution of this work is to provide some deeper insight on the fingerprint generation and compression process, which could constitute a more solid theoretical underpinning for the Swamidass and Baldi correction formulae.mixedFabio GrandiFabio Grand
    corecore