49 research outputs found
Role of Vertex Index in Substructure Identification and Activity Prediction: A Study on Antitubercular Activity of a Series of Acid Alkyl Ester Derivatives
Tuberculosis (TB) is a life threatening disease caused due to infection from Mycobacterium tu¬berculosis (Mtb). That most of the TB strains have become resistant to various existing drugs, develop¬ment of effective novel drug candidates to combat this disease is a need of the day. In spite of intensive research world-wide, the success rate of discovering a new anti-TB drug is very poor. Therefore, novel drug discovery methods have to be tried. We have used a rule based computational method that utilizes a vertex index, named ‘distance exponent index (Dx)’ (taken x = –4 here) for predicting anti-TB activity of a series of acid alkyl ester derivatives. The method is meant to identify activity related substructures from a series a compounds and predict activity of a compound on that basis. The high degree of successful pre¬diction in the present study suggests that the said method may be useful in discovering effective anti-TB compound. It is also apparent that substructural approaches may be leveraged for wide purposes in com¬puter-aided drug design. (doi: 10.5562/cca2306
The Rücker–Markov invariants of complex bio-systems: applications in parasitology and neuroinformatics
[Abstract] Rücker's walk count (WC) indices are well-known topological indices (TIs) used in Chemoinformatics to quantify the molecular structure of drugs represented by a graph in Quantitative structure–activity/property relationship (QSAR/QSPR) studies. In this work, we introduce for the first time the higher-order (kth order) analogues (WCk) of these indices using Markov chains. In addition, we report new QSPR models for large complex networks of different Bio-Systems useful in Parasitology and Neuroinformatics. The new type of QSPR models can be used for model checking to calculate numerical scores S(Lij) for links Lij (checking or re-evaluation of network connectivity) in large networks of all these fields. The method may be summarized as follows: (i) first, the WCk(j) values are calculated for all jth nodes in a complex network already created; (ii) A linear discriminant analysis (LDA) is used to seek a linear equation that discriminates connected or linked (Lij = 1) pairs of nodes experimentally confirmed from non-linked ones (Lij = 0); (iii) The new model is validated with external series of pairs of nodes; (iv) The equation obtained is used to re-evaluate the connectivity quality of the network, connecting/disconnecting nodes based on the quality scores calculated with the new connectivity function. The linear QSPR models obtained yielded the following results in terms of overall test accuracy for re-construction of complex networks of different Bio-Systems: parasite–host networks (93.14%), NW Spain fasciolosis spreading networks (71.42/70.18%) and CoCoMac Brain Cortex co-activation network (86.40%). Thus, this work can contribute to the computational re-evaluation or model checking of connectivity (collation) in complex systems of any science field.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; Ibero-NBIC, 209RT-0366Ministerio de Ciencia e Innovación; TIN2009-0770
The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS
BACKGROUND: Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS: We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS: We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826
QSAR models for the (eco-)toxicological characterization and prioritization of emerging pollutants: case studies and potential applications within REACH.
Under the European REACH regulation (Registration, Evaluation, Authorisation and Restriction of Chemical substances - (EC) No 1907/2006), there is an urgent need to acquire a large amount of information necessary to assess and manage the potential risk of thousands of industrial chemicals.
Meanwhile, REACH aims at reducing animal testing by promoting the intelligent and integrated use of alternative methods, such as in vitro testing and in silico techniques. Among these methods, models based on quantitative structure-activity relationships (QSAR) are useful tools to fill data gaps and to support the hazard and risk assessment of chemicals.
The present thesis was performed in the context of the CADASTER Project (CAse studies on the Development and Application of in-Silico Techniques for Environmental hazard and Risk assessment), which aims to integrate in-silico models (e.g. QSARs) in risk assessment procedures, by showing how to increase the use of non-testing information for regulatory decision-making under REACH. The aim of this thesis was the development of QSAR/QSPR models for the characterization of the (eco-)toxicological profile and environmental behaviour of chemical substances of emerging concern. The attention was focused on four classes of compounds studied within the CADASTER project, i.e. brominated flame retardants (BFRs), fragrances, prefluorinated compounds (PFCs) and (benzo)-triazoles (B-TAZs), for which limited amount of experimental data is currently available, especially for the basic endpoints required in regulation for the hazard and risk assessment.
Through several case-studies, the present thesis showed how QSAR models can be applied for the optimization of experimental testing as well as to provide useful information for the safety assessment of chemicals and support decision-making.
In the first case-study, simple multiple linear regression (MLR) and classification models were developed ad hoc for BFRs and PFCs to predict specific endpoints related to endocrine disrupting (ED) potential (e.g. dioxin-like activity, estrogenic and androgenic receptor binding, interference with thyroxin transport and estradiol metabolism). The analysis of modelling molecular descriptors allowed to highlight some structural features and important structural alerts responsible for increasing specific ED activities. The developed models were applied to screen over 200 BFRs and 33 PFCs without experimental data, and to prioritize the most hazardous chemicals (on the basis of ED potency profile), which have been then suggested to other CADASTER partners in order to focus the experimental testing.
In the second case-study, MLR models have been developed, specifically for B-TAZs, for the prediction of three key endpoints required in regulation to assess aquatic toxicity, i.e. acute toxicity in algae (EC50 72h Pseudokirchneriella subcapitata), daphnids (EC50 48h Daphnia magna) and fish (LC50 96h Onchorynchus mykiss). Also in this case, the developed QSARs were applied for screening purposes. Among over 350 B-TAZs lacking experimental data, 20 compounds, which were predicted as toxic (EC(LC)50 64 10 mg/L) or very toxic (EC(LC)50 64 1 mg/L) to the three aquatic species, were prioritized for further experimental testing.
Finally, in the third case-study, classification QSPR models were developed for the prediction of ready biodegradability of fragrance materials. Ready biodegradation is among the basic endpoints required for the assessment of environmental persistence of chemicals. When compared with some existing models commonly used for predicting biodegradation, the here proposed QSPRs showed higher classification accuracy toward fragrance materials. This comparison highlighted the importance of using local models when dealing with specific classes of chemicals.
All the proposed QSARs have been developed on the basis of the OECD principles for QSAR acceptability for regulatory purposes, paying particular attention to the external validation procedure and to the statistical definition of the applicability domain of the models. QSAR models based on molecular descriptors generated by both commercial (DRAGON) and freely-available (PaDELDescriptor, QSPR-Thesaurus) software have been proposed. The use of free tool allows for a wider applicability of the here proposed QSAR models.
Concluding, the QSAR models developed within this thesis are useful tools to support hazard and risk assessment of specific classes of emerging pollutants, and show how non-testing information can be
used for regulatory decisions, thus minimizing costs, time and saving animal lives.
Beyond their use for regulatory purposes, the here proposed QSARs can find application in the rational design of new safer compounds that are potentially less hazardous for human health and environment
Comparative QSAR analyses of competitive CYP2C9 inhibitors using three-dimensional molecular descriptors
One of the biggest challenges in QSAR studies
using three-dimensional descriptors is to generate
the bioactive conformation of the molecules. Com parative QSAR analyses have been performed on a
dataset of 34 structurally diverse and competitive
CYP2C9 inhibitors by generating their lowest
energy conformers as well as additional multiple
conformers for the calculation of molecular de scriptors. Three-dimensional descriptors account ing for the spatial characteristics of the molecules
calculated using E-Dragon were used as the inde pendent variables. The robustness and the predic tive performance of the developed models were
verified using both the internal [leave-one-out
(LOO)] and external statistical validation (test set
of 12 inhibitors). The best models (MLR using GET AWAY descriptors and partial least squares using
3D-MoRSE) were obtained by using the multiple
conformers for the calculation of descriptors and
were selected based upon the higher external pre diction (R2
test values of 0.65 and 0.63, respectively)
and lower root mean square error of prediction
(0.48 and 0.48, respectively). The predictive ability
of the best model, i.e., MLR using GETAWAY de scriptors was additionally verified on an external
test set of quinoline-4-carboxamide analogs and
resulted in an R2
test value of 0.6. These simple and
alignment-independent QSAR models offer the
possibility to predict CYP2C9 inhibitory activity of
chemically diverse ligands in the absence of X-ray
crystallographic information of target protein
structure and can provide useful insights about
the ADMET properties of candidate molecules in
the early phases of drug discovery.info:eu-repo/semantics/publishedVersio
Modelos bioinformáticos y estudio de receptores de proteínas mediante el uso de redes complejas para el desarrollo y diseño de fármacos eficaces en patologías del sistema nervioso central
La búsqueda y desarrollo de fármacos eficaces para el tratamiento de enfermedades
neurodegenerativas ha generado grandes expectativas, debido a la relevancia que tienen
sobre la economía de los sistemas sanitarios y la tremenda carga y desgaste que sufren familia
y cuidadores. Por ello, la industria farmacéutica se ha volcado sobre estas patologías en las
últimas tres décadas, pero las dificultades de realizar ensayos sobre el SN provoca que los
gastos y tiempos de investigación se disparen, limitando de forma considerable la rentabilidad
de los procesos tradicionales en el desarrollo de nuevos medicamentos. Es en este apartado
donde realiza sus aportaciones el diseño de fármacos, dedicando una parte del mismo al
desarrollo de modelos matemáticos que permitan predecir propiedades de interés para una
gran variedad de sistemas químicos incluyendo moléculas de bajo peso molecular, polímeros,
biopolímeros, sistemas heterogéneos, formulaciones farmacéuticas, conglomerados de
moléculas e iones, materiales, nano-estructuras y otros.
En dicho sentido, los estudios QSAR (Quantitative Structure-Activity-Relationships) son
usados cada vez mas como herramientas para el descubrimiento molecular. Estos modelos
QSAR pueden ser diseñados para que predigan la probabilidad de que un fármaco sea efectivo
contra una enfermedad degenerativa determinada ya sea la enfermedad de Parkinson,
Alzheimer o cualquier otra, actuando sobre una diana molecular específica.
En esta memoria presentamos de manera conjunta la revisión de modelos previos y
trabajos específicos novedosos, en los que se han introducido nuevos índices numéricos
utilizados para describir tanto la estructura molecular de fármacos como la estructura
macromolecular de sus dianas o receptores (proteínas y/o ADN/ARN). Con estos ITs hemos
sido capaces de desarrollar nuevos modelos multiQSAR de gran interés por su doble función en
la predicción de fármacos y sus dianas moleculares. Estos trabajos permitirán la introducción
de nuevos conceptos teóricos y la evolución hacia modelos con posibles aplicaciones en la
búsqueda de nuevos fármacos neuroprotectores útiles en el tratamiento de las enfermedades
de Parkinson y Alzheimer y/o nuevas dianas moleculares para estos fármacos. Este tipo de
investigación abarca un área general-básica en la que interactúan la Bioinformática y la
Quimioinformática
Construcción QSAR de redes complejas de compuestos de interés en Química Farmacéutica, Microbiología y Parasitología
El diseño para la búsqueda y desarrollo de fármacos eficaces para el tratamiento de estas enfermedades, que supriman la eliminación o la degeneración celular respectivamente, es una de las líneas de investigación más importantes dentro de la química farmacéutica. En esto entra el diseño de fármacos; el diseño de fármacos está dedicado al desarrollo de modelos matemáticos para predecir propiedades de interés para una gran variedad de sistemas químicos incluyendo moléculas de bajo peso molecular, polímeros, biopolímeros, sistemas heterogéneos, formulaciones farmacéuticas, conglomerados de moléculas e iones, materiales, nano-estructuras y otros. Este tipo de predicciones no pretenden sustituir las técnicas experimentales sino complementar las mismas ayudando a obtener nuevas moléculas activas con mayor probabilidad de éxito, con la ventaja que ello supone en términos de ahorro de tiempo, recursos materiales, y muy importante: el refinamiento y reducción en el uso de animales de laboratorio.
Esta metodología se basa en el uso de cálculos por ordenador y en las nuevas tecnologías de la informática. Las cuales pueden ser usadas:
Para moléculas pequeñas:
a) Estudios de relación cuantitativa estructura molecular-actividad farmacológica (QSAR) y de estructura molecular propiedades toxicológicas y eco-toxicológicas incluyendo mutagenicidad e carcinogénesis (QSTR).
b) Predicción de propiedades químicas y fisicoquímicas de moléculas. Estudios de relación estructura molecular y propiedades de absorción, distribución, metabolismo y eliminación (ADME).
c) Predicción de mecanismos de acción biológica de moléculas y evaluación in sílico de alta eficacia para grandes bases de datos (virtual HTS).
Para macromoléculas:
a) Estudios de interacción fármaco-receptor (neuronas).
b) Bioinformática aplicada a estudios de relación secuencia-función y propiedades estructurales de ácidos nucleicos y proteínas.
c) Búsqueda de nuevas dianas terapéuticas y “sitio activo” a partir de datos de Genómica, Proteómica.
d) Búsqueda de biomarcadores para diagnóstico de enfermedades o como indicadores de contaminaciones.
e) Predicción de propiedades fisicoquímicas de polímeros sintéticos, biopolímeros, materiales y nano-estructuras.
f) Predicción, diseño, y optimización de enzimas mutadas para procesos biotecnológicos
Development and use of databases for ligand-protein interaction studies
This project applies structure-activity relationship (SAR), structure-based and
database mining approaches to study ligand-protein interactions. To support these
studies, we have developed a relational database system called EDinburgh University
Ligand Selection System (EDULISS 2.0) which stores the structure-data files of +5.5
million commercially available small molecules (+4.0 million are recognised as
unique) and over 1,500 various calculated molecular properties (descriptors) for each
compound. A user-friendly web-based interface for EDULISS 2.0 has been
established and is available at http://eduliss.bch.ed.ac.uk/.
We have utilised PubChem bioassay data from an NMR based screen assay for a
human FKBP12 protein (PubChem AID: 608). A prediction model using a Logistic
Regression approach was constructed to relate the assay result with a series of
molecular descriptors. The model reveals 38 descriptors which are found to be good
predictors. These are mainly 3D-based descriptors, however, the presence of some
predictive functional groups is also found to give a positive contribution to the
binding interaction. The application of a neural network technique called Self
Organising Maps (SOMs) succeeded in visualising the similarity of the PubChem
compounds based on the 38 descriptors and clustering the 36 % of active compounds
(16 out of 44) in a cluster and discriminating them from 95 % of inactive compounds.
We have developed a molecular descriptor called the Atomic Characteristic Distance
(ACD) to profile the distribution of specified atom types in a compound. ACD has
been implemented as a pharmacophore searching tool within EDULISS 2.0. A
structure-based screen succeeded in finding inhibitors for pyruvate kinase and the
ligand-protein complexes have been successfully crystallised.
This study also discusses the interaction of metal-binding sites in metalloproteins.
We developed a database system and web-based interface to store and apply
geometrical information of these metal sites. The programme is called MEtal Sites
in Proteins at Edinburgh UniverSity (MESPEUS;
http://eduliss.bch.ed.ac.uk/MESPEUS/). MESPEUS is an exceptionally versatile
tool for the collation and abstraction of data on a wide range of structural questions.
As an example we carried out a survey using this database indicating that the most
common protein types which contain Mg-OATP-phosphate site are transferases and the
most common pattern is linkage through the β- and γ-phosphate groups
A non-conformational QSAR study for plant-derived larvicides against Zika <i>Aedes aegypti</i> L. vector
A set of 263 plant-derived compounds with larvicidal activity against Aedes aegypti L. (Diptera: Culicidae) vector is collected from the literature, and is studied by means of a non-conformational quantitative structure-activity relationships (QSAR) approach. The balanced subsets method (BSM) is employed to split the complete dataset into training, validation and test sets. From 26,775 freely available molecular descriptors, the most relevant structural features of compounds affecting the bioactivity are taken. The molecular descriptors are calculated through four different freewares, such as PaDEL, Mold², EPI Suite and QuBiLs-MAS. The replacement method (RM) variable subset selection technique leads to the best linear regression models. A successful QSAR equation involves 7-conformation-independent molecular descriptors, fulfiling the evaluated internal (loo, l30‰, VIF and Y-randomization) and external (test set with Ntest = 65 compounds) validation criteria. The practical application of this QSAR model reveals promising predicted values for some natural compounds with unknown experimental larvicidal activity. Therefore, the present model constitutes the first one based on a large molecular set, being a useful computational tool for identifying and guiding the synthesis of new active molecules inspired by natural products.Instituto de Investigaciones Fisicoquímicas Teóricas y AplicadasCentro de Investigación y Desarrollo en Ciencias AplicadasFacultad de Ciencias Agrarias y Forestale