17 research outputs found

    Evaluation of machine-learning methods for ligand-based virtual screening

    Get PDF
    Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed

    The influence of negative training set size on machine learning-based virtual screening

    Get PDF
    BACKGROUND: The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. RESULTS: The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. CONCLUSIONS: In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening

    The influence of the inactives subset generation on the performance of machine learning methods

    Get PDF
    Background: A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors. Results: In this study, the influence of the way of forming the set of inactives on the classification process was examined: random and diverse selection from the ZINC database, MDDR database and libraries generated according to the DUD methodology. All learning methods were tested in two modes: using one test set, the same for each method of inactive molecules generation and using test sets with inactives prepared in an analogous way as for training. The experiments were carried out for 5 different protein targets, 3 fingerprints for molecules representation and 7 classification algorithms with varying parameters. It appeared that the process of inactive set formation had a substantial impact on the machine learning methods performance. Conclusions: The level of chemical space limitation determined the ability of tested classifiers to select potentially active molecules in virtual screening tasks, as for example DUDs (widely applied in docking experiments) did not provide proper selection of active molecules from databases with diverse structures. The study clearly showed that inactive compounds forming training set should be representative to the highest possible extent for libraries that undergo screening

    Functional Group and Substructure Searching as a Tool in Metabolomics

    Get PDF
    BACKGROUND: A direct link between the names and structures of compounds and the functional groups contained within them is important, not only because biochemists frequently rely on literature that uses a free-text format to describe functional groups, but also because metabolic models depend upon the connections between enzymes and substrates being known and appropriately stored in databases. METHODOLOGY: We have developed a database named "Biochemical Substructure Search Catalogue" (BiSSCat), which contains 489 functional groups, >200,000 compounds and >1,000,000 different computationally constructed substructures, to allow identification of chemical compounds of biological interest. CONCLUSIONS: This database and its associated web-based search program (http://bisscat.org/) can be used to find compounds containing selected combinations of substructures and functional groups. It can be used to determine possible additional substrates for known enzymes and for putative enzymes found in genome projects. Its applications to enzyme inhibitor design are also discussed

    Database development and machine learning prediction of pharmaceutical agents

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Development Of Database And Computational Methods For Disease Detection And Drug Discovery

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Application and Development of Computational Methods for Ligand-Based Virtual Screening

    Get PDF
    The detection of novel active compounds that are able to modulate the biological function of a target is the primary goal of drug discovery. Different screening methods are available to identify hit compounds having the desired bioactivity in a large collection of molecules. As a computational method, virtual screening (VS) is used to search compound libraries in silico and identify those compounds that are likely to exhibit a specific activity. Ligand-based virtual screening (LBVS) is a subdiscipline that uses the information of one or more known active compounds in order to identify new hit compounds. Different LBVS methods exist, e.g. similarity searching and support vector machines (SVMs). In order to enable the application of these computational approaches, compounds have to be described numerically. Fingerprints derived from the two-dimensional compound structure, called 2D fingerprints, are among the most popular molecular descriptors available. This thesis covers the usage of 2D fingerprints in the context of LBVS. The first part focuses on a detailed analysis of 2D fingerprints. Their performance range against a wide range of pharmaceutical targets is globally estimated through fingerprint-based similarity searching. Additionally, mechanisms by which fingerprints are capable of detecting structurally diverse active compounds are identified. For this purpose, two different feature selection methods are applied to find those fingerprint features that are most relevant for the active compounds and distinguish them from other compounds. Then, 2D fingerprints are used in SVM calculations. The SVM methodology provides several opportunities to include additional information about the compounds in order to direct LBVS search calculations. In a first step, a variant of the SVM approach is applied to the multi-class prediction problem involving compounds that are active against several related targets. SVM linear combination is used to recover compounds with desired activity profiles and deprioritize compounds with other activities. Then, the SVM methodology is adopted for potency-directed VS. Compound potency is incorporated into the SVM approach through potencyoriented SVM linear combination and kernel function design to direct search calculations to the preferential detection of potent hit compounds. Next, SVM calculations are applied to address an intrinsic limitation of similarity-based methods, i.e., the presence of similar compounds having large differences in their potency. An especially designed SVM approach is introduced to predict compound pairs forming such activity cliffs. Finally, the impact of different training sets on the recall performance of SVM-based VS is analyzed and caveats are identified

    Machine Learning Methodologies for Interpretable Compound Activity Predictions

    Get PDF
    Machine learning (ML) models have gained attention for mining the pharmaceutical data that are currently generated at unprecedented rates and potentially accelerate the discovery of new drugs. The advent of deep learning (DL) has also raised expectations in pharmaceutical research. A central task in drug discovery is the initial search of compounds with desired biological activity. ML algorithms are able to find patterns in compound structures that are related to bioactivity, the so-called structure-activity relationships (SARs). ML-based predictions can complement biological testing to prioritize further experiments. Moreover, insights into model decisions are highly desired for further validation and identification of activity-relevant substructures. However, the interpretation of complex ML models remains essentially prohibitive. This thesis focuses on ML-based predictions of compound activity against multiple biological targets. Single-target and multi-target models are generated for relevant tasks including the prediction of profiling matrices from screening data and the discrimination between weak and strong inhibitors for more than a hundred kinases. Moreover, the relative performance of distinct modeling strategies is systematically analyzed under varying training conditions, and practical guidelines are reported. Since explainable model decisions are a clear requirement for the utility of ML bioactivity models in pharmaceutical research, methods for the interpretation and intuitive visualization of activity predictions from any ML or DL model are introduced. Taken together, this dissertation presents contributions that advance in the application and rationalization of ML models for biological activity and SAR predictions

    Phytochemical study and biological activities of diterpenes and derivatives from Plectranthus species

    Get PDF
    Tese de doutoramento, Farmácia (Química Farmacêutica e Terapêutica), Universidade de Lisboa, Faculdade de Farmácia, 2011This study focused on the research of new bioactive constituents from four species of the Plectranthus plants. Previous works on plants of the genus Plectranthus (Lamiaceæ) evidenced that some of their constituents possess interesting biological activities. The antimicrobial activity of the plant extracts and of the isolated metabolites was thoroughly searched. Antioxidant, anticholinesterase and anti-inflammatory properties of some compounds were also screened. The phytochemical study of the acetone extracts of Plectranthus ornatus Codd., P. ecklonii Benth., P. porcatus Winter & Van Jaarsv and P. saccatus Benth. rendered several terpenoid constituents mostly diterpenes. From P. ornatus three new forskolin-like labdane diterpenes (6-O-acetylforskolin, 1,6-di-O-acetylforskolin and 1,6-di-O-acetyl-9-deoxyforskolin), a new diterpene with the rare halimane skeleton (11R*-acetoxyhalima-5,13E-dien-15-oic acid), and two known labdane diterpenes were isolated; the rhinocerotinoic acid which was found in Plectranthus species for the first time, and plectrornatin C. Six known triterpenoids were also identified as mixtures. The study of P. ecklonii led to the isolation of two known abietanes, sugiol and parvifloron D. Sugiol was obtained from Plectranthus species for the first time. Four known triterpenoids were also identified as mixtures. P. porcatus, a plant not hitherto studied, yield a new spiro-abietane diterpene [(13S,15S)-6β,7α,12α,19-tetrahydroxy-13β,16-cyclo-8-abietene-11,14-dione]. A new beyerane diterpene (ent-7α-acetoxy-15-beyeren-18-oic acid) was isolated from P. saccatus. Attempting to find novel bioactive prototypes from the more potent antibacterial diterpenes, isolated in higher yields, some diterpene derivatives were prepared. Nine new derivatives were obtained from (11R*,13E)-11-acetoxyhalima-5,13-dien-15-oic acid (P. ornatus). A new 2β-(4-hydroxy)benzoyloxy derivative of microstegiol was prepared from parvifloron D (P. ecklonii). From the 7α-acetoxy-6β-hydroxyroyleanone (isolated in the past from P. grandidentatus) thirteen ester derivatives were synthesized, whereof ten were new compounds. The unequivocal chemical structures of pure compounds (natural and derivatives) were deduced from their spectroscopic (IR, MS, 1D and 2D NMR experiments) and physico-chemical data, as well as from literature information. The preliminary antimicrobial activity screenings of all the isolated metabolites showed that several diterpenes inhibited the growth of the Gram positive bacteria tested. In addition, the minimum inhibitory concentration against standard and clinical isolates of sensitive and resistant Staphylococcus and Enterococcus strains was determined for the antibacterial metabolites and their synthesized derivatives. The (11R*,13E)-11-acetoxyhalima-5,13-dien-15-oic acid and its (11R*,13E)-halima-5,13-diene-11,15-diol derivative were the more active halimanes. Parvifloron D was less active than its microstegiol 2β-(4-hydroxy)benzoate derivative, but both showed more potent antibacterial activities than the halimane diterpenoids. The three 12-O-benzoyl esters derivatives of the 7α-acetoxy-6β-hydroxyroyleanone prototype revealed to be more potent growth inhibitors against Staphylococcus and Enterococcus strains than the prototype. The 6β-propionyloxy-12-O-propionyl derivative also showed to be more active against Enterococcus than the viii prototype. Generally, the 12-esters and the 6,12-diesters were more active against Enterococcus than Staphylococcus strains. The hydrophobic extra-interactions with the bacterial targets seem to play an important role on the activity of royleanones derivatives prepared. Taking into account the IC50 values which expressed the scavenging DPPH radical ability, the isolated metabolite parvifloron D as well as 7α-acetoxy-6β-hydroxyroyleanone showed in vitro antioxidant activity. The in vitro acetylcholinesterase assay did not detect any activity for all the newly isolated diterpenes and 7α-acetoxy-6β-hydroxyroyleanone. The COX inhibitor screening assay was tested on 6-O-acetylforskolin, rhinocerotinoic acid, plectrornatin C, (11R*,13E)-halima-5,13-diene-11,15- diol, 11R*-acetoxyhalima-5,13E-dien-15-oic acid and on its methyl ester, for their ability to inhibit COX-2. The preliminary results encourage further studies aiming to confirm and to examine its potential anti-inflammatory activity in a more robust approach.estudo teve como objectivo a pesquisa de novos constituintes bioactivos de quatro espécies de plantas do género Plectranthus. A actividade antimicrobiana dos extractos obtidos e dos metabolitos isolados foi realizada e foram testadas as propriedades anti-oxidante, anti-colinesterase e anti-inflamatória de alguns compostos. O estudo fitoquímico dos extractos de acetona de Plectranthus ornatus Codd., P. ecklonii Benth., P. porcatus Winter & Van Jaarsv. e P. saccatus Benth. originou diversos constituintes terpénicos, principalmente diterpenos. Três novos diterpenos do tipo forskolina (6-O-acetilforskolina; 1,6-di-O-acetilforskolina e 1,6-di-O-acetil-9-deoxiforskolina) foram isolados de P. ornatus. Foram também identificados um novo diterpeno com o raro esqueleto de halimano (ácido 11R*-acetoxihalima-5,13E-dien-15-óico), dois diterpenos labdânicos conhecidos; o ácido rinocerotinóico encontrado pela primeira vez em espécies do género Plectranthus, e a plectrornatina C. Seis triterpenos já conhecidos foram igualmente identificados na forma de misturas. O estudo de P. ecklonii originou o isolamento de dois abietanos conhecidos: o sugiol e a parviflorona D. O sugiol foi isolado pela primeira vez de espécies Plectranthus. Outros quatro triterpenos conhecidos foram identificados também como misturas. A planta P. porcatus, até à data não estudada, originou um novo diterpeno spiro-abietânico [(13S,15S)-6β,7α,12α,19-tetrahidroxi-13β,16-ciclo-8-abietene-11,14-diona]. Um novo diterpeno com esqueleto de beierano (ácido ent-7α-acetoxi-15-beieren-18-óico) foi isolado de P. saccatus. Na tentativa de obter novos protótipos bioactivos, vários derivados foram preparados, a partir dos diterpenos antibacterianos mais potentes e isolados em maior quantidade. Nove novos derivados foram obtidos do ácido (11R*,13E)-11-acetoxihalima-5,13-dien-15-óico (P. ornatus). Um novo derivado 2β-(4-hidroxi)benzoilado do microstegiol, foi preparado a partir da parviflorona D (P. ecklonii). Treze ésteres derivados da 7α-acetoxi-6β-hidroxiroyleanona (isolada anteriormente de P. grandidentatus) foram sintetizados, sendo de assinalar que dez dos derivados são compostos novos. A determinação estrutural dos compostos puros (naturais e derivados) foi deduzida por espectroscopia (IV, EM, RMN 1D e 2D), propriedades físico-químicas e com base na informação obtida da literatura. O estudo preliminar da actividade antimicrobiana de todos os metabolitos isolados, mostrou que diversos diterpenos inibem o crescimento de bactérias de Gram positivo. A concentração mínima inibitória (CMI) dos metabolitos e seus derivados foi determinada em estirpes de Staphylococcus e Enterococcus, tanto em bactérias padrão como em isolados clínicos resistentes e sensíveis a antibióticos. O ácido (11R*,13E)-11-acetoxihalima-5,13-dien-15-óico e o seu derivado (11R*,13E)-halima-5,13-diene-11,15-diol foram os halimanos mais activos. A parviflorona D foi menos activa do que o seu correspondente derivado 2β-(4-hidroxi)benzoilado, mas ambos apresentaram uma actividade antibacteriana mais potente do que os diterpenos com esqueleto de halimano. Os três 12-O-benzoil-ésteres derivados do protótipo 7α-acetoxi-6β-hidroxiroyleanona revelaram ser inibidores mais potentes do que a royleanona-protótipo, contra as estirpes testadas de Staphylococcus e Enterococcus. O derivado 6β-propioniloxi-12-O-propionilo mostrou ser o mais activo contra as estirpes testadas de Enterococcus do que o protótipo. De um modo geral, os derivados 12-ésteres e os 6,12-diésteres foram mais activos contra as estirpes de Enterococcus do que as estirpes de Staphylococcus testadas. As interacções hidrofóbicas com os alvos bacterianos parecem ter um papel importante na actividade antibacteriana dos derivados de royleanona preparados. Os metabolitos parviflorona D e a 7α-acetoxi-6β-hidroxiroyleanona demostraram possuir actividade antioxidante in vitro, tendo em conta os valores de IC50 que expressam a actividade anti-oxidante com base na captura do radical DPPH. Todos os novos diterpenos isolados e derivados obtidos neste trabalho foram testados e não revelaram possuir actividade inibitória da acetilcolinesterase in vitro. A actividade anti-inflamatória foi testada nos compostos 6-O-acetilforskolina, ácido rinocerotinóico, plectrornatina C, (11R*,13E)-halima-5,13-diene-11,15-diol, ácido 11R*-acetoxihalima-5,13E-dien-15-óico e no seu éster metílico, através da sua capacidade de inibir a COX-2. Os resultados preliminares obtidos apoiam a necessidade de estudos futuros de forma a confirmar, explorar e discutir uma potencial actividade anti-inflamatória.The research work was performed, mostly, in the Faculdade de Farmácia da Universidade de Lisboa at the Medicinal Chemistry Group (former Centro de Estudos de Ciências Farmacêuticas – CECF) of the Institute for Medicines and Pharmaceutical Sciences (iMed.UL). Funding to these research centres and the attribution of a Doctoral degree grant (SFRH/BD/19250/2004) were provided by the Fundação para a Ciência e a Tecnologia - Ministério da Ciência, Tecnologia e Ensino Superior (FCT-MCTES)
    corecore