8 research outputs found

    QSAR Classification Models for Predicting the Activity of Inhibitors of Beta-Secretase (BACE1) Associated with Alzheimer’s Disease

    Get PDF
    Alzheimer’s disease is one of the most common neurodegenerative disorders in elder population. The β-site amyloid cleavage enzyme 1 (BACE1) is the major constituent of amyloid plaques and plays a central role in this brain pathogenesis, thus it constitutes an auspicious pharmacological target for its treatment. In this paper, a QSAR model for identification of potential inhibitors of BACE1 protein is designed by using classification methods. For building this model, a database with 215 molecules collected from different sources has been assembled. This dataset contains diverse compounds with different scaffolds and physical-chemical properties, covering a wide chemical space in the drug-like range. The most distinctive aspect of the applied QSAR strategy is the combination of hybridization with backward elimination of models, which contributes to improve the quality of the final QSAR model. Another relevant step is the visual analysis of the molecular descriptors that allows guaranteeing the absence of information redundancy in the model. The QSAR model performances have been assessed by traditional metrics, and the final proposed model has low cardinality, and reaches a high percentage of chemical compounds correctly classified.Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Sebastián Pérez, Víctor. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Martínez, María J.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: De la Cruz Pérez, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Vazquez, Gustavo Esteban. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Católica del Uruguay; UruguayFil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; EspañaFil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Química; ArgentinaFil: Campillo Martín, Nuria Eugenia. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; Españ

    QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances

    Get PDF
    The Aryl hydrocarbon receptor (AhR) plays important roles in many normal and pathological physiological processes, including endocrine homeostasis, foetal development, cell cycle regulation, cellular oxidation/antioxidation, immune regulation, metabolism of endogenous and exogenous substances, and carcinogenesis. An experimental data set for human in vitro AhR activation comprising 324,858 substances, of which 1,982 were confirmed actives, was used to test an in-house-developed approach to rationally select Quantitative Structure-Activity Relationship (QSAR) training set substances from an unbalanced data set. In the first iteration, active and inactive substances were selected by random to make QSAR models. Then, more inactive substances were added to the training set in two further iterations based on incorrect or out-of-domain predictions to produce larger models. The resulting 'rational' model, comprising 832 actives and four times as many inactives, i.e. 3,328, was compared to a model with a training set of same size and proportion of inactives chosen entirely by random. Both models underwent robust cross-validation and external validation showing good statistical performance, with the rational model having external validation sensitivity of 85.1% and specificity of 97.1%, compared to the random model with sensitivity 89.1% and specificity 91.3%. Furthermore, we integrated the training sets for both models with the 93 external validation test set actives and 372 randomly selected inactives to make two final models. They also underwent external validations for specificity and cross-validations, which confirmed that good predictivity was maintained. All developed models were applied to predict 80,086 EU REACH substances. The rational and random final models had 63.1% and 56.9% coverage of the REACH set, respectively, and predicted 1,256 and 3,214 substances as actives. The final models as well as predictions for AhR activation for 650,000 substances will be published in the Danish (Q)SAR Database and can, for example, be used for priority setting, in read-across predictions and in weight-of-evidence assessments of chemicals

    Компьютерное прогнозирование спектров биологической активности химических соединений: возможности и ограничения

    Get PDF
    oai:www.bmc-rm.org:article/4An essential characteristic of chemical compounds is their biological activity since its presence can become the basis for the use of the substance for therapeutic purposes, or, on the contrary, limit the possibilities of its practical application due to the manifestation of side action and toxic effects. Computer assessment of the biological activity spectra makes it possible to determine the most promising directions for the study of the pharmacological action of particular substances, and to filter out potentially dangerous molecules at the early stages of research. For more than 25 years, we have been developing and improving the computer program PASS (Prediction of Activity Spectra for Substances), designed to predict the biological activity spectrum of substance based on the structural formula of its molecules. The prediction is carried out by the analysis of structure-activity relationships for the training set, which currently contains information on structures and known biological activities for more than one million molecules. The structure of the organic compound is represented in PASS using Multilevel Neighborhoods of Atoms descriptors; the activity prediction for new compounds is performed by the naive Bayes classifier and the structure-activity relationships determined by the analysis of the training set. We have created and improved both local versions of the PASS program and freely available web resources based on PASS (http://www.way2drug.com). They predict several thousand biological activities (pharmacological effects, molecular mechanisms of action, specific toxicity and adverse effects, interaction with the unwanted targets, metabolism and action on molecular transport), cytotoxicity for tumor and non-tumor cell lines, carcinogenicity, induced changes of gene expression profiles, metabolic sites of the major enzymes of the first and second phases of xenobiotics biotransformation, and belonging to substrates and/or metabolites of metabolic enzymes. The web resource Way2Drug is used by over 19 000 researchers from more than 100 countries around the world, which allowed them to obtain over 600 000 predictions and publish about 500 papers describing the obtained results. The analysis of the published works shows that in some cases the interpretation of the prediction results presented by the authors of these publications requires an adjustment. In this work, we provide the theoretical basis and consider, on particular examples, the opportunities and limitations of computer-aided prediction of biological activity spectra.Важной характеристикой химических соединений является их биологическая активность, поскольку ее наличие может стать основой для использования вещества в терапевтических целях, либо, напротив, ограничить возможности его практического применения вследствие проявления побочных и токсических эффектов. Компьютерная оценка спектра биологической активности дает возможность определить наиболее перспективные направления для тестирования фармакологического действия конкретных веществ и отсеять потенциально опасные молекулы на ранних стадиях исследований. Свыше 25 лет нами осуществляется разработка и совершенствование компьютерной программы PASS (Prediction of Activity Spectra for Substances), предназначенной для прогнозирования спектра биологической активности вещества по структурной формуле его молекул. Прогноз осуществляется на основе анализа зависимостей «структура-активность» для соединений обучающей выборки, в настоящее время содержащей информацию о структурах и известных видах биологической активности более чем для миллиона молекул. Описание структуры молекул органического соединения реализовано в PASS посредством дескрипторов атомных окрестностей (Multilevel Neighborhoods of Atoms), прогнозирование активности для новых соединений выполняется алгоритмом на основе «наивного Байесовского подхода» и зависимостей «структура-активность», выявляемых при анализе обучающей выборки. Нами созданы и совершенствуются как локальные версии программы PASS, так и свободно доступные в Интернет веб-ресурсы на основе PASS (http://way2drug.com): прогноз нескольких тысяч видов биологической активности (фармакологические эффекты, молекулярные механизмы действия, специфическая токсичность и побочное действие, метаболизм, а также влияние на нежелательные мишени, молекулярный транспорт, генную экспрессию), прогноз цитотоксичности для опухолевых и неопухолевых клеточных линий, прогноз канцерогенности, прогноз индуцированных органическими соединениями изменений профилей экспрессии генов, прогноз взаимодействия с ферментами метаболизма лекарств, в том числе прогноз сайтов метаболизма, а также прогноз принадлежности к субстратам и/или метаболитам этих ферментов. Веб-ресурс Way2Drug используют свыше 19 тысяч исследователей более чем из 100 стран мира, что позволило им осуществить свыше 600 тысяч прогнозов и опубликовать около 500 работ с описанием полученных результатов. Анализ опубликованных работ показывает, что в некоторых случаях приводимая авторами этих публикаций интерпретация результатов прогноза требует корректировки. В рамках настоящей работы мы представим теоретическое обоснование и рассмотрим на конкретных примерах возможности и ограничения компьютерного прогнозирования спектров биологической активности

    Cheminformatics Analysis and Computational Modeling of Detergent-Sensitive Aggregation

    Get PDF
    Small molecule aggregates cause detergent reversible protein sequestration and are the most prevalent source of nonspecific activity in biochemical screening assays. Large volumes of publicly available dose-response screens performed in the presence or absence of detergent have enabled cheminformatics analysis into chemical aggregation which reinforces prior notions that aggregation is prevalent and context dependent. We report the development of random forest classifiers trained on screens of β-lactamase or cruzain targets under well-defined assay conditions which distinguish putative aggregators and non-aggregators with balanced accuracies as high as 78%. These models overcome limitations of existing computational predictors related to programmatic errors, blurred modeling endpoints, and poor external predictivity. Model interpretation indicated that polarity, aliphaticity, and weight are significantly correlated with aggregation propensity, although these features alone estimate behavior with under 70% accuracy. Our curated datasets and validated models will help identify potential aggregators and reduce resource waste during drug discovery and optimization

    Development and application of QSAR models for mechanisms related to endocrine disruption.

    Get PDF

    Machine Learning Methodologies for Interpretable Compound Activity Predictions

    Get PDF
    Machine learning (ML) models have gained attention for mining the pharmaceutical data that are currently generated at unprecedented rates and potentially accelerate the discovery of new drugs. The advent of deep learning (DL) has also raised expectations in pharmaceutical research. A central task in drug discovery is the initial search of compounds with desired biological activity. ML algorithms are able to find patterns in compound structures that are related to bioactivity, the so-called structure-activity relationships (SARs). ML-based predictions can complement biological testing to prioritize further experiments. Moreover, insights into model decisions are highly desired for further validation and identification of activity-relevant substructures. However, the interpretation of complex ML models remains essentially prohibitive. This thesis focuses on ML-based predictions of compound activity against multiple biological targets. Single-target and multi-target models are generated for relevant tasks including the prediction of profiling matrices from screening data and the discrimination between weak and strong inhibitors for more than a hundred kinases. Moreover, the relative performance of distinct modeling strategies is systematically analyzed under varying training conditions, and practical guidelines are reported. Since explainable model decisions are a clear requirement for the utility of ML bioactivity models in pharmaceutical research, methods for the interpretation and intuitive visualization of activity predictions from any ML or DL model are introduced. Taken together, this dissertation presents contributions that advance in the application and rationalization of ML models for biological activity and SAR predictions

    PREDICTIVE CHEMINFORMATICS ANALYSIS OF DIVERSE CHEMOGENOMICS DATA SOURCES: APPLICATIONS TO DRUG DISCOVERY, ASSAY INTERFERENCE, AND TEXT MINING

    Get PDF
    In this dissertation, we describe the cheminformatics analysis of diverse chemogenomics data sources as well as the application of these data to several drug discovery efforts. In Chapter 1, we describe the discovery and characterization of novel Ebola virus inhibitors through QSAR-based virtual screening. In Chapter 2, we report the discovery and analysis of a series of potent and selective doublecortin-like kinase 1 (DCLK1) inhibitors using QSAR modeling, virtual screening, Matched Molecular Pair Analysis (MMPA), and molecular docking. In Chapter 3, we performed a large-scale analysis of publicly available data in PubChem to probe the reliability and applicability of Pan-Assay INterference compoundS (PAINS) alerts, a popular computational drug screening tool. In Chapter 4, we explore the PubMed database as a novel source of biomedical data and describe the development of Chemotext, a publicly available web server capable of text-mining the published literature.Doctor of Philosoph
    corecore