82 research outputs found

    Computational discovery of novel anthelmintic natural compounds from Agave Brittoniana trel. Spp. Brachypus

    Get PDF
    Helminth infections are a medical problem in the world nowadays. This report used bond-based 2D quadratic indices, a bond-level QuBiLs-MAS molecular descriptor family, and Linear Discriminant Analysis (LDA) to obtain a quantitative linear model that discriminates between anthelmintic and non-anthelmintic drug-like organic-compounds. The model obtained correctly classified 87.46% and 81.82% of the training and external data sets, respectively. The developed model was used in a virtual screening to predict the biological activity of all chemicals (19) previously obtained and chemically characterized by some authors of this report from Agave brittoniana Trel. spp. Brachypus. The model identified several metabolites (12) as possible anthelmintics, and a group of 5 novel natural products was tested in an in vitro assay against Fasciola hepatica (100% effectivity at 500 µg/mL). Finally, the two best hits were evaluated in vivo in bald/c mice and the same helminth parasite using a 25 mg/kg dose. Compound 8 (Karatavinoside A) showed an efficacy of 92.2% in vivo. It is important to remark that this natural compound exhibits similar-to-superior activity as triclabendazole, the best human fasciolicide available in the market against Fasciola hepatica, resulting in a novel lead scaffold with anti-helminthic activity.15 página

    Atom, atom-type, and total linear indices of the "molecular pseudograph's atom adjacency matrix": Application to QSPR/QSAR studies of organic compounds

    Get PDF
    In this paper we describe the application in QSPR/QSAR studies of a new group of molecular descriptors: atom, atom-type and total linear indices of the molecular pseudograph's atom adjacency matrix. These novel molecular descriptors were used for the prediction of boiling point and partition coefficient (log P), specific rate constant (log k), and antibacterial activity of 28 alkyl-alcohols and 34 derivatives of 2-furylethylenes, respectively. For this purpose two quantitative models were obtained to describe the alkyl-alcohols' boiling points. The first one includes only two total linear indices and showed a good behavior from a statistical point of view (R2 = 0.984, s = 3.78, F = 748.57, q2 = 0.981, and scv = 3.91). The second one includes four variables [3 global and 1 local (heteroatom) linear indices] and it showed an improvement in the description of physical property (R 2 = 0.9934, s = 2.48, F = 871.96, q2 = 0.990, and s cv = 2.79). Later, linear multiple regression analysis was also used to describe log P and log k of the 2-furyl-ethylenes derivatives. These models were statistically significant [(R2 = 0.984, s = 0.143, and F = 113.38) and (R2 = 0.973, s = 0.26 and F = 161.22), respectively] and showed very good stability to data variation in leave-one-out (LOO) cross-validation experiment [(q2 = 0.93.8 and scv = 0.178) and (q2 = 0.948 and scv = 0.33), respectively]. Finally, a linear discriminant model for classifying antibacterial activity of these compounds was also achieved with the use of the atom and atom-type linear indices. The global percent of good classification in training and external test set obtained was of 94.12% and 100.0%, respectively. The comparison with other approaches (connectivity indices, total and local spectral moments, quantum chemical descriptors, topographic indices and Estate/biomolecular encounter parameters) reveals a good behavior of our method. The approach described in this paper appears to be a very promising structural invariant, useful for QSPR/QSAR studies and computer-aided "rational" drug design.Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA

    Protein quadratic indices of the "macromolecular pseudograph's α-carbon atom adjacency matrix". 1. Prediction of Arc repressor alanine-mutant's stability

    Get PDF
    This report describes a new set of macromolecular descriptors of relevance to protein QSAR/QSPR studies, protein's quadratic indices. These descriptors are calculated from the macromolecular pseudograph's α-carbon atom adjacency matrix. A study of the protein stability effects for a complete set of alanine substitutions in Arc repressor illustrates this approach. Quantitative Structure-Stability Relationship (QSSR) models allow discriminating between near wild-type stability and reduced-stability A-mutants. A linear discriminant function gives rise to excellent discrimination between 85.4% (35/41) and 91.67% (11/12) of near wild-type stability/reduced stability mutants in training and test series, respectively. The model's overall predictability oscillates from 80.49 until 82.93, when n varies from 2 to 10 in leave-n-out cross validation procedures. This value stabilizes around 80.49% when n was > 6. Additionally, canonical regression analysis corroborates the statistical quality of the classification model (Rcanc = 0.72, p-level <0.0001). This analysis was also used to compute biological stability canonical scores for each Arc A-mutant. On the other hand, nonlinear piecewise regression model compares favorably with respect to linear regression one on predicting the melting temperature (t m) of the Arc A-mutants. The linear model explains almost 72% of the variance of the experimental tm (R = 0.85 and s = 5.64) and LOO press statistics evidenced its predictive ability (q2 = 0.55 and s cv = 6.24). However, this linear regression model falls to resolve tm predictions of Arc A-mutants in external prediction series. Therefore, the use of nonlinear piecewise models was required. The tm values of A-mutants in training (R = 0.94) and test (R = 0.91) sets are calculated by piecewise model with a high degree of precision. A break-point value of 51.32°C characterizes two mutants' clusters and coincides perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutants' Arc homodimers. These models also permit the interpretation of the driving forces of such a folding process. The models include protein's quadratic indices accounting for hydrophobic (z1), bulk-steric (z2), and electronic (z3) features of the studied molecules. Preponderance of z1 and z3 over z 2 indicates the higher importance of the hydrophobic and electronic side chain terms in the folding of the Arc dimer. In this sense, developed equations involve short-reaching (k ≤ 3), middle- reaching (3 < k ≤ 7) and far-reaching (k = 8 or greater) z1, 2, 3-protein's quadratic indices. This situation points to topologic/topographic protein's backbone interactions control of the stability profile of wild-type Arc and its A-mutants. Consequently, the present approach represents a novel and very promising way to mathematical research in biology sciences.Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA

    NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN

    Get PDF
    Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    Integrating Safety Issues in Optimizing Solvent Selection and Process Design

    Get PDF
    Incorporating consideration for safety issues while designing solvent processes has become crucial in light of the chemical process incidents involving solvents that have taken place in recent years. The implementation of inherently safer design concepts is considered beneficial to avoid hazards during early stages of design. The application of existing process design and modeling techniques that aid the concepts of ‘substitution’, ‘intensification’ and ‘attenuation’ has been shown in this work. For ‘substitution’, computer aided molecular design (CAMD) technique has been applied to select inherently safer solvents for a solvent operation. For ‘intensification’ and ‘attenuation’, consequence models and regulatory guidance from EPA RMP have been integrated into process simulation. Combining existing techniques provides a design team with a higher level of information to make decisions based on process safety. CAMD is a methodology used for designing compounds with desired target properties. An important aspect of this methodology concerns the prediction of properties given the structure of the molecule. This work also investigates the applicability of Quantitative Structure Property Relationship (QSPR) and topological indices to CAMD. The evaluation was based on models developed to predict flash point properties of different classes of solvents. Multiple linear regression and neural network analysis were used to develop QSPR models, but there are certain limitations associated with using QSPR in CAMD which have been discussed and need further work. Practical application of molecular design and process design techniques have been demonstrated in a case study on liquid-liquid extraction of acetic acid-water mixture. Suitable inherently safer solvents were identified using ICAS-ProCAMD, and consequence models were integrated into Aspen Plus simulator using a calculator sheet. Upon integrating flammable and toxic hazard modeling, solvents such as 5-nonanone, 2-nonanone and 5-methyl-2-hexanone provide inherently safer options, while conventionally-used solvent, ethyl acetate, provides higher degree of separation capability. A conclusive decision regarding feasible solvents and operating conditions would depend on design requirements, regulatory guidance, and safety criteria specified for the process. Inherent safety has always been an important consideration to be implemented during early design steps, and this research presents a methodology to incorporate the principles and obtain inherently safer alternatives

    Machine learning methods for quantitative structure-property relationship modeling

    Get PDF
    Tese de doutoramento, Informática (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2014Due to the high rate of new compounds discovered each day and the morosity/cost of experimental measurements there will always be a significant gap between the number of known chemical compounds and the amount of chemical compounds for which experimental properties are available. This research work is motivated by the fact that the development of new methods for predicting properties and organize huge collections of molecules to reveal certain chemical categories/patterns and select diverse/representative samples for exploratory experiments are becoming essential. This work aims to increase the capability to predict physical, chemical and biological properties, using data mining methods applied to complex non-homogeneous data (chemical structures), for large information repositories. In the first phase of this work, current methodologies in quantitative structure-property modelling were studied. These methodologies attempt to relate a set of selected structure-derived features of a compound to its property using model-based learning. This work focused on solving major issues identified when predicting properties of chemical compounds and on the solutions explored using different molecular representations, feature selection techniques and data mining approaches. In this context, an innovative hybrid approach was proposed in order to improve the prediction power and comprehensibility of QSPR/QSAR problems using Random Forests for feature selection. It is acknowledged that, in general, similar molecules tend to have similar properties; therefore, on the second phase of this work, an instance-based machine learning methodology for predicting properties of compounds using the similarity-based molecular space was developed. However, this type of methodology requires the quantification of structural similarity between molecules, which is often subjective, ambiguous and relies upon comparative judgements, and consequently, there is currently no absolute standard of molecular similarity. In this context, a new similarity method was developed, the non-contiguous atom matching (NAMS), based on the optimal atom alignment using pairwise matching algorithms that take into account both topological profiles and atoms/bonds characteristics. NAMS can then be used for property inference over the molecular metric space using ordinary kriging in order to obtain robust and interpretable predictive results, providing a better understanding of the underlying relationship structure-property.Devido ao crescimento exponencial do número de compostos químicos descobertos diariamente e à morosidade/custo de medições experimentais, existe uma diferença significativa entre o número de compostos químicos conhecidos e a quantidade de compostos para os quais estão disponíveis propriedades experimentais. O desenvolvimento de novos métodos para a previsão de propriedades e organização de grandes coleções de moléculas que permitam revelar certas categorias/padrões químicos e selecionar amostras diversas/representativas para estudos exploratórios estão a tornar-se essenciais. Este trabalho tem como objetivo melhorar a capacidade de prever propriedades físicas, químicas e biológicas, através de métodos de aprendizagem automática aplicados a dados complexos não homogeneos (estruturas químicas), para grandes repositórios de informação. Numa primeira fase deste trabalho, foi feito o estudo de metodologias atualmente aplicadas para a modelação quantitativa entre estruturapropriedades. Estas metodologias tentam relacionar um conjunto seleccionado de descritores estruturais de uma molécula com as suas propriedades, utilizando uma abordagem baseada em modelos. Este trabalho centrou-se em solucionar as principais dificuldades identificadas na previsão de propriedades de compostos químicos e nas soluções exploradas utilizando diferentes representações moleculares, técnicas de seleção de descritores e abordagens de aprendizagem automática. Neste contexto, foi proposta uma abordagem híbrida inovadora para melhorar o capacidade de previsão e compreensão de problemas QSPR/QSAR utilizando o algoritmo "Random Forests" (Florestas Aleatórias) para seleção de descritores. É reconhecido que, em geral, moléculas semelhantes tendem a ter propriedades semelhantes; assim, numa segunda fase deste trabalho foi desenvolvida uma metodologia de aprendizagem automática baseada em instâncias para a previsão de propriedades de compostos químicos utilizando o espaço métrico construído a partir da semelhança estrutural entre moléculas. No entanto, este tipo de metodologia requer a quantificação de semelhança estrutural entre moléculas, o que é muitas vezes uma tarefa subjetiva, ambígua e dependente de julgamentos comparativos e, consequentemente, não existe atualmente nenhum padrão absoluto para definir semelhança molecular. Neste âmbito, foi desenvolvido um novo método de semelhança molecular, o “Non-Contiguous Atom Matching Structural Similarity” (NAMS), que se baseia no alinhamento de átomos utilizando algoritmos de emparelhamento que têm em conta os perfis topológicos das ligações e as características dos átomos e ligações. O espaço métrico molecular construído utilizando o NAMS pode ser aplicado à inferência de propriedades usando uma técnica de interpolação espacial, a "krigagem", que tem em conta a relação espacial entre as instâncias, com o objetivo de se obter uma previsão consistente e interpretável, proporcionando uma melhor compreensão da relação entre estrutura-propriedades.Fundação para a Ciência e a Tecnologia (FCT
    corecore