7 research outputs found

    Support vector machines for drug discovery

    Get PDF
    Support vector machines (SVMs) have displayed good predictive accuracy on a wide range of classification tasks and are inherently adaptable to complex problem domains. Structure-property correlation (SPC) analysis is a vital part of the contemporary drug discovery process, in which several components of the search for novel molecular compounds with therapeutic potential may be performed by computer (in silicd). Inferred relationships between molecular structure and biological properties of interest are used to eliminate compounds unsuitable for further development. In order to improve process efficiency without rejecting useful compounds, predictive accuracy of such relationships must remain high despite a paucity of data from which to infer them. This thesis describes the application of SVMs to SPC analysis and investigates methods with which to enhance performance and facilitate integration of the technique into present practice. Overviews of contemporary drug discovery and the role of machine learning place the investigation into context. Computational discrimination between compounds according to their structures and properties of interest is described in detail, as is the SVM algorithm. A framework for the assessment of supervised machine learning performance on SPC data is proposed and employed to assess SVM performance alongside state-of-the-art techniques for in silico SPC analysis on data provided by GlaxoSmithKline. SVM performance is competitive and the comparison prompts adaptations of both data treatment and algorithmic application to explore the effects of data paucity, class imbalance and outlying data. Subsequent work weights the SVM kernel matrix to recognise heavily populated regions of training data and suggests the incorporation of domain-specific clustering methods to assist the standard SVM algorithm. The notion that SVM kernel functions may incorporate existing domain-specific methods leads to kernel functions that employ existing pharmaceutical similarity measures to treat an abstract, binary representation of molecular structure that is not used widely for SPC analysis

    In silico approach to screen compounds active against parasitic nematodes of major socio-economic importance

    Get PDF
    Infections due to parasitic nematodes are common causes of morbidity and fatality around the world especially in developing nations. At present however, there are only three major classes of drugs for treating human nematode infections. Additionally the scientific knowledge on the mechanism of action and the reason for the resistance to these drugs is poorly understood. Commercial incentives to design drugs that are endemic to developing countries are limited therefore, virtual screening in academic settings can play a vital role is discovering novel drugs useful against neglected diseases. In this study we propose to build robust machine learning model to classify and screen compounds active against parasitic nematodes.A set of compounds active against parasitic nematodes were collated from various literature sources including PubChem while the inactive set was derived from DrugBank database. The support vector machine (SVM) algorithm was used for model development, and stratified ten-fold cross validation was used to evaluate the performance of each classifier. The best results were obtained using the radial basis function kernel. The SVM method achieved an accuracy of 81.79% on an independent test set. Using the model developed above, we were able to indentify novel compounds with potential anthelmintic activity.In this study, we successfully present the SVM approach for predicting compounds active against parasitic nematodes which suggests the effectiveness of computational approaches for antiparasitic drug discovery. Although, the accuracy obtained is lower than the previously reported in a similar study but we believe that our model is more robust because we intentionally employed stringent criteria to select inactive dataset thus making it difficult for the model to classify compounds. The method presents an alternative approach to the existing traditional methods and may be useful for predicting hitherto novel anthelmintic compounds.12 page(s
    corecore