80 research outputs found

    Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information

    Get PDF
    The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu

    The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

    Get PDF
    BACKGROUND: Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS: We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS: We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826

    QSAR modeling studies of a library of Human Tyrosinase inhibitors

    Get PDF
    Melanogenesis is the chemical process responsible for synthesizing melanin, which occurs in melanocytes, in subcellular lysosome-like organelles called melanosomes. Melanin plays a vital role in protecting the skin from damage caused by ultraviolet rays. However, excess melanin production or abnormal distribution can cause various pigmentation disorders, such as over-tanning, age spots, and melasma. Skin disorders like these, have prompted the development of skin-whitening compounds to reduce melanin content. Furthermore, inhibition of melanin synthesis is considered a valid therapeutic strategy for treating advanced melanotic melanomas Human tyrosinase (hsTYR) is the most important enzyme involved in the melanogenesis process, as it catalyzes, at least, its first two steps. Tyrosinase from the white button mushroom Agaricus bisporus (abTYR) has been widely available at low cost from commercial sources for several decades, whereas hsTYR is still expensive and difficult to produce. The importance of discovering more and better hsTYR inhibitors has been widely discussed, as when tested against hsTYR, several abTYR inhibitors provide disappointing results, including some of the most extensively used depigmenting compounds now used in dermocosmetics. An in silico methodology that can be used to predict compound bioactivities is QSAR (quantitative structure-activity relationship) modelling. A QSAR model tries to find correlations between a biological activity of interest and molecular descriptors calculated from the compound structure. In this work, a QSAR model was developed to predict hsTYR inhibition activity using the PYTHON computer language and its PyQSAR package. To develop a QSAR model, a library of 196 known hsTYR inhibitors was gathered, and compounds were divided into 6 groups according to their scaffold structure. A total of 33 QSAR models were prepared using different combinations of the defined groups and different pools of molecular descriptors. QSAR model 32 was selected for further use as it presented good statistical robustness and had the highest number of compounds, 41 in total. Of the 28,933 molecular descriptors calculated by the OCHEM platform for the 41 compounds used, PyQSAR selected 4 to be used in the model: C-026; DISSM2C; MaxdssC; WHALES90_Rem. The statistical data obtained after the validation of the QSAR model by cross-validation was excellent, namely the determination coefficient (R2CV=0.9147), the value of the square root of the mean error (RMSE CV=0.1878) and the mean value of the score of the multiple linear regression method (Q2CV=0.8922). This QSAR model originates a mathematical equation that allows the prediction of hsTYR inhibition activity by new compounds with similar structures. A library of natural compounds, with a structure similar to those used to develop QSAR model 32, was created using the COCONUT database of natural compounds. A total of 1,628 natural compounds were gathered, their molecular descriptors were calculated, and the QSAR model 32 equation was applied. The results are displayed on a website and can be viewed by accessing the URL http://esa.ipb.pt/qsar/. The ZINC15 database was used to determine which of the compounds in the developed natural compound library would be available for purchase after predicting the hsTYR inhibitory activity of each compound in the library. A total of 18 different compounds were bought from different companies. To evaluate these compounds experimental ability to inhibit hsTYR and thus validate QSAR model 32, the compounds will be tested against this enzyme. If those compounds activity is confirmed, they may be used in cosmeceutical applications.A melanogénese é o processo químico responsável pela síntese da melanina, que ocorre nos melanócitos, em organelos subcelulares semelhantes aos lisossomas chamados melanossomas. A melanina desempenha um papel vital na proteção da pele dos danos causados pelos raios ultravioleta. No entanto, a produção excessiva de melanina ou distribuição anormal pode causar vários distúrbios de pigmentação, como bronzeamento excessivo, manchas senis e melasma. Distúrbios de pele como estes levaram ao desenvolvimento de compostos de clareamento da pele para reduzir o conteúdo de melanina. Além disso, a inibição da síntese de melanina é considerada uma estratégia terapêutica válida para o tratamento de melanomas melanóticos avançados A tirosinase humana (hsTYR) é a enzima mais importante envolvida no processo de melanogénese, pois catalisa, pelo menos, as suas duas primeiras etapas. A tirosinase do cogumelo branco Agaricus bisporus (abTYR) está amplamente disponível a baixo custo em fontes comerciais há várias décadas, enquanto a hsTYR ainda é cara e difícil de produzir. A importância de descobrir mais e melhores inibidores de hsTYR tem sido amplamente discutida, pois quando testados contra hsTYR, vários inibidores de abTYR fornecem resultados dececionantes, incluindo alguns dos compostos despigmentantes mais usados atualmente em dermocosméticos. Uma metodologia in silico que pode ser usada para prever bioatividades compostas é a modelação QSAR (quantitative structure-activity relationship). Um modelo QSAR tenta encontrar correlações entre uma atividade biológica de interesse e descritores moleculares calculados a partir da estrutura do composto. Neste trabalho, um modelo QSAR foi desenvolvido para prever a atividade de inibição de hsTYR usando a linguagem de computador PYTHON e seu pacote PyQSAR. Para desenvolver um modelo QSAR, uma biblioteca de 196 inibidores hsTYR conhecidos foi reunida e os compostos foram divididos em 6 grupos de acordo com sua estrutura de base. Um total de 33 modelos QSAR foram preparados usando diferentes combinações dos grupos definidos e diferentes pools de descritores moleculares. O modelo QSAR 32 foi selecionado para uso posterior por apresentar boa robustez estatística e possuir o maior número de compostos, 41 no total. Dos 28 933 descritores moleculares calculados pela plataforma OCHEM para os 41 compostos utilizados, o PyQSAR selecionou 4 para serem utilizados no modelo: C-026; DISSM2C; MaxdssC; WHALES90_Rem. Os dados estatísticos obtidos após a validação do modelo QSAR por validação cruzada foram excelentes, nomeadamente o coeficiente de correlação (R2CV=0,9147), o valor da raiz quadrada do erro médio (RMSE CV=0,1878) e o valor médio da pontuação do método de regressão linear múltipla (Q2CV=0,8922). Este modelo QSAR origina uma equação matemática que permite prever a atividade de inibição de hsTYR por novos compostos com estruturas semelhantes. Uma biblioteca de compostos naturais, com uma estrutura similar àquelas usadas para desenvolver o modelo QSAR 32, foi criada usando o banco de dados de compostos naturais COCONUT. Um total de 1 628 compostos naturais foram recolhidos, os seus descritores moleculares calculados e a equação do modelo QSAR 32 foi aplicada. Os resultados são apresentados num website criado por nós e podem ser visualizados acedendo ao URL http://esa.ipb.pt/qsar/. O banco de dados ZINC15 foi usado para determinar quais compostos na biblioteca de compostos naturais desenvolvidos estariam disponíveis para compra após prever a atividade inibitória de hsTYR de cada composto na biblioteca. Um total de 18 compostos diferentes foram comprados de diferentes empresas. Para avaliar a capacidade experimental destes compostos em inibir a hsTYR e assim validar o modelo QSAR 32, os compostos serão testados contra esta enzima. Caso a atividade desses compostos seja confirmada, eles poderão ser utilizados em aplicações cosmecêuticas

    Ensuring confidence in predictions: A scheme to assess the scientific validity of in silico models

    Get PDF
    The use of in silico tools within the drug development process to predict a wide range of properties including absorption, distribution, metabolism, elimination and toxicity has become increasingly important due to changes in legislation and both ethical and economic drivers to reduce animal testing. Whilst in silico tools have been used for decades there remains reluctance to accept predictions based on these methods particularly in regulatory settings. This apprehension arises in part due to lack of confidence in the reliability, robustness and applicability of the models. To address this issue we propose a scheme for the verification of in silico models that enables end users and modellers to assess the scientific validity of models in accordance with the principles of good computer modelling practice. We report here the implementation of the scheme within the Innovative Medicines Initiative project “eTOX” (electronic toxicity) and its application to the in silico models developed within the frame of this project

    In Silico Resources to Assist in the Development and Evaluation of Physiologically-Based Kinetic Models

    Get PDF
    Since their inception in pharmaceutical applications, physiologically-based kinetic (PBK) models are increasingly being used across a range of sectors, such as safety assessment of cosmetics, food additives, consumer goods, pesticides and other chemicals. Such models can be used to construct organ-level concentration-time profiles of xenobiotics. These models are essential in determining the overall internal exposure to a chemical and hence its ability to elicit a biological response. There are a multitude of in silico resources available to assist in the construction and evaluation of PBK models. An overview of these resources is presented herein, encompassing all attributes required for PBK modelling. These include predictive tools and databases for physico-chemical properties and absorption, distribution, metabolism and elimination (ADME) related properties. Data sources for existing PBK models, bespoke PBK software and generic software that can assist in model development are also identified. On-going efforts to harmonise approaches to PBK model construction, evaluation and reporting that would help increase the uptake and acceptance of these models are also discussed

    Chembench: A Publicly Accessible, Integrated Cheminformatics Portal

    Get PDF
    The enormous increase in the amount of publicly available chemical genomics data and the growing emphasis on data sharing and open science mandates that cheminformaticians also make their models publicly available for broad use by the scientific community. Chembench is one of the first publicly accessible, integrated cheminformatics Web portals. It has been extensively used by researchers from different fields for curation, visualization, analysis, and modeling of chemogenomics data. Since its launch in 2008, Chembench has been accessed more than 1 million times by more than 5000 users from a total of 98 countries. We report on the recent updates and improvements that increase the simplicity of use, computational efficiency, accuracy, and accessibility of a broad range of tools and services for computer-assisted drug design and computational toxicology available on Chembench. Chembench remains freely accessible at https://chembench.mml.unc.ed
    corecore