169,868 research outputs found

    In-silico Predictive Mutagenicity Model Generation Using Supervised Learning Approaches

    Get PDF
    With the advent of High Throughput Screening techniques, it is feasible to filter possible leads from a mammoth chemical space that can act against a particular target and inhibit its action. Virtual screening complements the in-vitro assays which are costly and time consuming. This process is used to sort biologically active molecules by utilizing the structural and chemical information of the compounds and the target proteins in order to screen potential hits. Various data mining and machine learning tools utilize Molecular Descriptors through the knowledge discovery process using classifier algorithms that classify the potentially active hits for the drug development process.
&#xa

    Advances in De Novo Drug Design : From Conventional to Machine Learning Methods

    Get PDF
    De novo drug design is a computational approach that generates novel molecular structures from atomic building blocks with no a priori relationships. Conventional methods include structure-based and ligand-based design, which depend on the properties of the active site of a biological target or its known active binders, respectively. Artificial intelligence, including ma-chine learning, is an emerging field that has positively impacted the drug discovery process. Deep reinforcement learning is a subdivision of machine learning that combines artificial neural networks with reinforcement-learning architectures. This method has successfully been em-ployed to develop novel de novo drug design approaches using a variety of artificial networks including recurrent neural networks, convolutional neural networks, generative adversarial networks, and autoencoders. This review article summarizes advances in de novo drug design, from conventional growth algorithms to advanced machine-learning methodologies and high-lights hot topics for further development.Peer reviewe

    Multi-Fidelity Active Learning with GFlowNets

    Full text link
    In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design.Comment: Code: https://github.com/nikita-0209/mf-al-gf

    Protein-Ligand Binding Affinity Directed Multi-Objective Drug Design Based on Fragment Representation Methods

    Get PDF
    Drug discovery is a challenging process with a vast molecular space to be explored and numerous pharmacological properties to be appropriately considered. Among various drug design protocols, fragment-based drug design is an effective way of constraining the search space and better utilizing biologically active compounds. Motivated by fragment-based drug search for a given protein target and the emergence of artificial intelligence (AI) approaches in this field, this work advances the field of in silico drug design by (1) integrating a graph fragmentation-based deep generative model with a deep evolutionary learning process for large-scale multi-objective molecular optimization, and (2) applying protein-ligand binding affinity scores together with other desired physicochemical properties as objectives. Our experiments show that the proposed method can generate novel molecules with improved property values and binding affinities

    MMsPred: a bioactivity and toxicology predictive system

    Get PDF
    In the last decade, the development and use of new methods in combinatorial chemistry and high-throughput screening has dramatically increased the number of known biologically active compounds. Paradoxically, the number of drugs reaching the market has not followed the same trend, often because many of the candidate drugs present poor qualities in absorption, distribution, metabolism, excretion, and toxicological properties (ADME-Tox). The ability to recognize and discard bad candidates early in the drug discovery steps would save lost investments in time and money. Machine learning techniques could provide solutions to this problem.
The goal of my research is to develop classifiers that accurately discriminate between active and inactive molecules for a specific target. To this end, I am comparing the effectiveness of the application of different machine learning techniques to this problem.	As a source of data we have selected a set of PubChem's public BioAssays1. In addition, with the objective of realizing a real-time query service with our predictors, we aim to keep the features describing the chemical compounds relatively simple.
At the end of this process, we should better understand how to build statistical models that are able to recognize molecules active in a specific bioassay, including how to select the most appropriate classification technique, and how to describe compounds in such a way that is not excessively resource-consuming to generate, yet contains sufficient information for the classification. We see immediate applications of such technology to recognize compounds with high-risk of toxicity, and also to suggest likely metabolic pathways that would process it

    Intelligent data acquisition for drug design through combinatorial library design

    Get PDF
    A problem that occurs in machine learning methods for drug discovery is aneed for standardized data. Methods and interest exist for producing new databut due to material and budget constraints it is desirable that each iteration ofproducing data is as efficient as possible. In this thesis, we present two papersmethods detailing different problems for selecting data to produce. We invest-igate Active Learning for models that use the margin in model decisiveness tomeasure the model uncertainty to guide data acquisition. We demonstrate thatthe models perform better with Active Learning than with random acquisitionof data independent of machine learning model and starting knowledge. Wealso study the multi-objective optimization problem of combinatorial librarydesign. Here we present a framework that could process the output of gener-ative models for molecular design and give an optimized library design. Theresults show that the framework successfully optimizes a library based onmolecule availability, for which the framework also attempts to identify usingretrosynthesis prediction. We conclude that the next step in intelligent dataacquisition is to combine the two methods and create a library design modelthat use the information of previous libraries to guide subsequent designs

    Scientific discovery as a combinatorial optimisation problem: How best to navigate the landscape of possible experiments?

    Get PDF
    A considerable number of areas of bioscience, including gene and drug discovery, metabolic engineering for the biotechnological improvement of organisms, and the processes of natural and directed evolution, are best viewed in terms of a ā€˜landscapeā€™ representing a large search space of possible solutions or experiments populated by a considerably smaller number of actual solutions that then emerge. This is what makes these problems ā€˜hardā€™, but as such these are to be seen as combinatorial optimisation problems that are best attacked by heuristic methods known from that field. Such landscapes, which may also represent or include multiple objectives, are effectively modelled in silico, with modern active learning algorithms such as those based on Darwinian evolution providing guidance, using existing knowledge, as to what is the ā€˜bestā€™ experiment to do next. An awareness, and the application, of these methods can thereby enhance the scientific discovery process considerably. This analysis fits comfortably with an emerging epistemology that sees scientific reasoning, the search for solutions, and scientific discovery as Bayesian processes
    • ā€¦
    corecore