929 research outputs found

    An update on the strategies in multicomponent activity monitoring within the phytopharmaceutical field

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To-date modern drug research has focused on the discovery and synthesis of single active substances. However, multicomponent preparations are gaining increasing importance in the phytopharmaceutical field by demonstrating beneficial properties with respect to efficacy and toxicity.</p> <p>Discussion</p> <p>In contrast to single drug combinations, a botanical multicomponent therapeutic possesses a complex repertoire of chemicals that belong to a variety of substance classes. This may explain the frequently observed pleiotropic bioactivity spectra of these compounds, which may also suggest that they possess novel therapeutic opportunities. Interestingly, considerable bioactivity properties are exhibited not only by remedies that contain high doses of phytochemicals with prominent pharmaceutical efficacy, but also preparations that lack a sole active principle component. Despite that each individual substance within these multicomponents has a low molar fraction, the therapeutic activity of these substances is established via a potentialization of their effects through combined and simultaneous attacks on multiple molecular targets. Although beneficial properties may emerge from such a broad range of perturbations on cellular machinery, validation and/or prediction of their activity profiles is accompanied with a variety of difficulties in generic risk-benefit assessments. Thus, it is recommended that a comprehensive strategy is implemented to cover the entirety of multicomponent-multitarget effects, so as to address the limitations of conventional approaches.</p> <p>Summary</p> <p>An integration of standard toxicological methods with selected pathway-focused bioassays and unbiased data acquisition strategies (such as gene expression analysis) would be advantageous in building an interaction network model to consider all of the effects, whether they were intended or adverse reactions.</p

    Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point

    Get PDF
    Machine learning (ML) models to predict the toxicity of small molecules have garnered great attention and have become widely used in recent years. Computational toxicity prediction is particularly advantageous in the early stages of drug discovery in order to filter out molecules with high probability of failing in clinical trials. This has been helped by the increase in the number of large toxicology databases available. However, being an area of recent application, a greater understanding of the scope and applicability of ML methods is still necessary. There are various kinds of toxic end points that have been predicted in silico. Acute oral toxicity, hepatotoxicity, cardiotoxicity, mutagenicity, and the 12 Tox21 data end points are among the most commonly investigated. Machine learning methods exhibit different performances on different data sets due to dissimilar complexity, class distributions, or chemical space covered, which makes it hard to compare the performance of algorithms over different toxic end points. The general pipeline to predict toxicity using ML has already been analyzed in various reviews. In this contribution, we focus on the recent progress in the area and the outstanding challenges, making a detailed description of the state-of-the-art models implemented for each toxic end point. The type of molecular representation, the algorithm, and the evaluation metric used in each research work are explained and analyzed. A detailed description of end points that are usually predicted, their clinical relevance, the available databases, and the challenges they bring to the field are also highlighted.Fil: Cavasotto, Claudio Norberto. Universidad Austral. Facultad de Ciencias Biomédicas. Instituto de Investigaciones en Medicina Traslacional. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Medicina Traslacional; ArgentinaFil: Scardino, Valeria. Universidad Austral; Argentin

    AI in drug discovery and its clinical relevance

    Get PDF
    The COVID-19 pandemic has emphasized the need for novel drug discovery process. However, the journey from conceptualizing a drug to its eventual implementation in clinical settings is a long, complex, and expensive process, with many potential points of failure. Over the past decade, a vast growth in medical information has coincided with advances in computational hardware (cloud computing, GPUs, and TPUs) and the rise of deep learning. Medical data generated from large molecular screening profiles, personal health or pathology records, and public health organizations could benefit from analysis by Artificial Intelligence (AI) approaches to speed up and prevent failures in the drug discovery pipeline. We present applications of AI at various stages of drug discovery pipelines, including the inherently computational approaches of de novo design and prediction of a drug's likely properties. Open-source databases and AI-based software tools that facilitate drug design are discussed along with their associated problems of molecule representation, data collection, complexity, labeling, and disparities among labels. How contemporary AI methods, such as graph neural networks, reinforcement learning, and generated models, along with structure-based methods, (i.e., molecular dynamics simulations and molecular docking) can contribute to drug discovery applications and analysis of drug responses is also explored. Finally, recent developments and investments in AI-based start-up companies for biotechnology, drug design and their current progress, hopes and promotions are discussed in this article.  Other InformationPublished in:HeliyonLicense: https://creativecommons.org/licenses/by/4.0/See article on publisher's website: https://doi.org/10.1016/j.heliyon.2023.e17575 </p

    Automated QuantMap for rapid quantitative molecular network topology analysis

    Get PDF
    ABSTRACT Summary: The previously disclosed QuantMap method for grouping chemicals by biological activity used online services for much of the data gathering and some of the numerical analysis. The present work attempts to streamline this process by using local copies of the databases and in-house analysis. Using computational methods similar or identical to those used in the previous work, a qualitatively equivalent result was found in just a few seconds on the same dataset (collection of 18 drugs). We use the user-friendly Galaxy framework to enable users to analyze their own datasets. Hopefully, this will make the QuantMap method more practical and accessible and help achieve its goals to provide substantial assistance to drug repositioning, pharmacology evaluation and toxicology risk assessment. Availability: http:

    Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

    Get PDF
    In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    QSAR Modeling: Where Have You Been? Where Are You Going To?

    Get PDF
    Quantitative Structure-Activity Relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss: (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists towards collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making

    Räumliche Statistik zur Analyse Chemischer Datensätze zur Validierung von Techniken des Virtuellen Screenings

    Get PDF
    A common finding of many reports evaluating virtual screening methods is that validation results vary considerably with changing benchmark datasets. It is widely assumed that these effects are caused by the redundancy and cluster structure inherent to those datasets. These phenomena manifest themselves in descriptor space, which is termed the dataset topology. A methodology for the characterization of dataset topology based on spatial statistics is introduced. With this methodology it is possible to associate differences in virtual screening performance on different datasets with differences in dataset topology. Moreover, the better virtual screening performance of certain descriptors can be explained by their ability of representing the benchmark datasets by a more favorable topology. It is shown, that the composition of some benchmark datasets causes topologies that lead to over-optimistic validation results even in very "simple" descriptor spaces. Spatial statistics analysis as proposed here facilitates the detection of such biased datasets and provides a tool for the design of unbiased benchmark datasets. General principles for the design of benchmark datasets, which are not affected by topological bias, were developed. Refined Nearest Neighbor Analysis was used to design benchmark datasets based on PubChem bioactivity data. A workflow is devised that purges datasets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies was applied to generate corresponding datasets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These datasets provide a tool for an Maximum Unbiased Validation (MUV) of virtual screening methods. The datasets and a MATLAB toolbox for spatial statistics are freely available on the enclosed CD-ROM or via the internet at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.Ein Ergebnis vieler Arbeiten zur Validierung von Methoden des Virtuellen Screenings ist, dass die Ergebnisse stark von den Validierdatensätzen abhängen. Es wird angenommen, dass diese Effekte durch die Redundanz und Clusterstruktur der Datensätze verursacht werden. Die Abbildung eines Datensatzes im Deskriptorraum, die ``Datensatztopologie'' , spiegelt diese Phänomene wider. Im Rahmen der Arbeit wird eine Methode aus dem Bereich der räumlichen Statistik zur Charakterisierung der Datensatztopologie eingeführt. Mit dieser Methode ist es möglich, Unterschiede in den Ergebnissen von Validierexperimenten mit Unterschieden in der Datensatztopologie zu erklären. Darüberhinaus kann das bessere Abschneiden einiger Deskriptoren mit deren Fähigkeit erklärt werden, günstigere Topologien zu erzeugen. Die Zusammensetzung mancher Validierdatensätze bedingt Topologien, die zu überoptimistischen Validierergebnissen führen. Die vorgestellte Methodik ermöglicht es, solche Datensätze vor der Validierung zu erkennen. Weiterhin kann die Methode verwendet werden, um zielgerichtet Datensätze zu konstruieren, die unverfälschte Validierergebnisse sicherstellen. Auf diesen Ergebnissen aufbauend werden generelle Kriterien für die Konstruktion von Validierdatensätzen entwickelt. Mit Hilfe von Methoden der ``Refined Nearest Neighbor Analysis” werden verzerrungsfreie Datesätze generiert. Als Basis dienen Datensätze von Substanzen mit Bioaktivität aus PubChem. Ein neu entwickeltes Verfahren ermöglicht es, Substanzen mit unspezifischer Bioaktivität aus diesen Datensätzen zu entfernen. Durch Optimierung der Datensatztopologie werden korrespondierende Datensätze von Aktiven und Inaktiven erstellt, die eine Maximal Unverfälschte Validierung (MUV) von Techniken des Virtuellen Screenings ermöglichen. Diese Datensätze und eine MATLAB Toolbox für räumliche Statistik sind auf der beiliegenden CD-ROM oder im Internet unter http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html frei verfügbar

    Machine Learning for Kinase Drug Discovery

    Get PDF
    Cancer is one of the major public health issues, causing several million losses every year. Although anti-cancer drugs have been developed and are globally administered, mild to severe side effects are known to occur during treatment. Computer-aided drug discovery has become a cornerstone for unveiling treatments of existing as well as emerging diseases. Computational methods aim to not only speed up the drug design process, but to also reduce time-consuming, costly experiments, as well as in vivo animal testing. In this context, over the last decade especially, deep learning began to play a prominent role in the prediction of molecular activity, property and toxicity. However, there are still major challenges when applying deep learning models in drug discovery. Those challenges include data scarcity for physicochemical tasks, the difficulty of interpreting the prediction made by deep neural networks, and the necessity of open-source and robust workflows to ensure reproducibility and reusability. In this thesis, after reviewing the state-of-the-art in deep learning applied to virtual screening, we address the previously mentioned challenges as follows: Regarding data scarcity in the context of deep learning applied to small molecules, we developed data augmentation techniques based on the SMILES encoding. This linear string notation enumerates the atoms present in a compound by following a path along the molecule graph. Multiplicity of SMILES for a single compound can be reached by traversing the graph using different paths. We applied the developed augmentation techniques to three different deep learning models, including convolutional and recurrent neural networks, and to four property and activity data sets. The results show that augmentation improves the model accuracy independently of the deep learning model, as well as of the data set size. Moreover, we computed the uncertainty of a model by using augmentation at inference time. In this regard, we have shown that the more confident the model is in its prediction, the smaller is the error, implying that a given prediction can be trusted and is close to the target value. The software and associated documentation allows making predictions for novel compounds and have been made freely available. Trusting predictions blindly from algorithms may have serious consequences in areas of healthcare. In this context, better understanding how a neural network classifies a compound based on its input features is highly beneficial by helping to de-risk and optimize compounds. In this research project, we decomposed the inner layers of a deep neural network to identify the toxic substructures, the toxicophores, of a compound that led to the toxicity classification. Using molecular fingerprints —vectors that indicate the presence or absence of a particular atomic environment —we were able to map a toxicity score to each of these substructures. Moreover, we developed a method to visualize in 2D the toxicophores within a compound, the so- called cytotoxicity maps, which could be of great use to medicinal chemists in identifying ways to modify molecules to eliminate toxicity. Not only does the deep learning model reach state-of-the-art results, but the identified toxicophores confirm known toxic substructures, as well as expand new potential candidates. In order to speed up the drug discovery process, the accessibility to robust and modular workflows is extremely advantageous. In this context, the fully open-source TeachOpenCADD project was developed. Significant tasks in both cheminformatics and bioinformatics are implemented in a pedagogical fashion, allowing the material to be used for teaching as well as the starting point for novel research. In this framework, a special pipeline is dedicated to kinases, a family of proteins which are known to be involved in diseases such as cancer. The aim is to gain insights into off-targets, i.e. proteins that are unintentionally affected by a compound, and that can cause adverse effects in treatments. Four measures of kinase similarity are implemented, taking into account sequence, and structural information, as well as protein-ligand interaction, and ligand profiling data. The workflow provides clustering of a set of kinases, which can be further analyzed to understand off-target effects of inhibitors. Results show that analyzing kinases using several perspectives is crucial for the insight into off-target prediction, and gaining a global perspective of the kinome. These novel methods can be exploited in the discovery of new drugs, and more specifically diseases involved in the dysregulation of kinases, such as cancer
    • …
    corecore