64 research outputs found

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

    Full text link
    Chemistry and materials science are complex. Recently, there have been great successes in addressing this complexity using data-driven or computational techniques. Yet, the necessity of input structured in very specific forms and the fact that there is an ever-growing number of tools creates usability and accessibility challenges. Coupled with the reality that much data in these disciplines is unstructured, the effectiveness of these tools is limited. Motivated by recent works that indicated that large language models (LLMs) might help address some of these issues, we organized a hackathon event on the applications of LLMs in chemistry, materials science, and beyond. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines

    Theory and application of medium to high throughput prediction method techniques for asymmetric catalyst design

    No full text
    With the use of computational methods in the field of drug design becoming ever more prevalent, there is pressure to port these technologies to other fields. One of the fields ripe for application of computational drug design techniques; specifically virtual screening and computer-aided molecular design, is the design and synthesis of asymmetric catalysts. Such methods could either guide the selection of the optimal catalyst(s) for a given reaction and a given substrate or provide an enriched selection of highly efficient asymmetric catalysts which enable the synthetic chemists to focus on the most promising candidates. This would in turn provide savings in time and reduce the costs associated with the synthesis and evaluation of large libraries of molecules. However, to be applicable to the evaluation of a large number of potential catalysts, speed is of utmost importance. This impetus has led to the development of medium to high throughput virtual screening (HTVS) methods for asymmetric catalyst development or assessment, although a very few applications have been reported. These methods typically fall into four classes: methods combining quantum mechanics and molecular mechanics (QM/MM), pure molecular mechanics-based methods \u2013 a class which can be subdivided into static and dynamic transition state modeling \u2013 and lastly quantitative structure selectivity relationship methods (QSSR). This review will cover specific methods within these classes and their application to selected reactions.Peer reviewed: YesNRC publication: Ye

    Single-Point Mutation with a Rotamer Library Toolkit: Toward Protein Engineering

    No full text
    Protein engineers have long been hard at work to harness biocatalysts as a natural source of regio-, stereo-, and chemoselectivity in order to carry out chemistry (reactions and/or substrates) not previously achieved with these enzymes. The extreme labor demands and exponential number of mutation combinations have induced computational advances in this domain. The first step in our virtual approach is to predict the correct conformations upon mutation of residues (i.e., rebuilding side chains). For this purpose, we opted for a combination of molecular mechanics and statistical data. In this work, we have developed automated computational tools to extract protein structural information and created conformational libraries for each amino acid dependent on a variable number of parameters (e.g., resolution, flexibility, secondary structure). We have also developed the necessary tool to apply the mutation and optimize the conformation accordingly. For side-chain conformation prediction, we obtained overall average root-mean-square deviations (RMSDs) of 0.91 and 1.01 Å for the 18 flexible natural amino acids within two distinct sets of over 3000 and 1500 side-chain residues, respectively. The commonly used dihedral angle differences were also evaluated and performed worse than the state of the art. These two metrics are also compared. Furthermore, we generated a family-specific library for kinases that produced an average 2% lower RMSD upon side-chain reconstruction and a residue-specific library that yielded a 17% improvement. Ultimately, since our protein engineering outlook involves using our docking software, Fitted/Impacts, we applied our mutation protocol to a benchmarked data set for self- and cross-docking. Our side-chain reconstruction does not hinder our docking software, demonstrating differences in pose prediction accuracy of approximately 2% (RMSD cutoff metric) for a set of over 200 protein/ligand structures. Similarly, when docking to a set of over 100 kinases, side-chain reconstruction (using both general and biased conformation libraries) had minimal detriment to the docking accuracy

    Customizable Generation of Synthetically Accessible, Local Chemical Subspaces

    No full text
    Screening large libraries of chemicals has been an efficient strategy to discover bioactive compounds; however a portion of the potential for success is limited to the available libraries. Synergizing combinatorial and computational chemistries has emerged as a time-efficient strategy to explore the chemical space more widely. Ideally, streamlining the evaluation process for larger, feasible chemical libraries would become commonplace. Thus, combinatorial tools and, for example, docking methods would be integrated to identify novel bioactive entities. The idea is simple in nature, but much more complex in practice; combinatorial chemistry is more than the coupling of chemicals into products: synthetic feasibility includes chemoselectivity, stereoselectivity, protecting group chemistry, and chemical availability which must all be considered for combinatorial library design. In addition, intuitive interfaces and simple user manipulation is key for optimal use of such tools by organic chemistscrucial for the integration of such software in medicinal chemistry laboratories. We present herein Finders and React2Dintegrated into the Virtual Chemist platform, a modular software suite. This approach enhances virtual combinatorial chemistry by identifying available chemicals compatible with a user-defined chemical transformation and by carrying out the reaction leading to libraries of realistic, synthetically accessible chemicalsall with a completely automated, black-box, and efficient design. We demonstrate its utility by generating ∼40 million synthetically accessible, stereochemically accurate compounds from a single library of 100 000 purchasable molecules and 56 well-characterized chemical reactions

    The Third CACHE Challenge – Finding Ligands Targeting the Macrodomain of SARS-CoV-2 NSP3 Using AI-inspired and Knowledge-Based Approaches.

    No full text
    The SARS-CoV-2 virus contains a host of nonstructural proteins (NSPs) that contribute to its structure and viral function. Among them is the nonstructural protein 3 (NSP3), which contains a macrodomain (Mac1) that interferes with antiviral adenosine diphosphate (ADP)-ribosylation signaling. Catalytic mutations in Mac1 render viruses nonpathogenic, making this enzyme a promising target for antiviral development. For this reason, the third CACHE challenge focused on identifying binders of the Mac1 domain of NSP3 for the development of novel antivirals against SARS-CoV-2. To this end, we used available structural data of the NSP3 Mac1 domain in complex with known fragment binders as starting points for ligand discovery; our efforts were primarily focused on sub-sites of the ADP binding site in the NSP3 macrodomain. Then, using Artificial intelligence (AI)-guided and knowledge-based fragment merging and expansion approaches, we generated novel molecules that would serve as templates to identify highly similar compounds in the Enamine REAL database that would be commercially available. Our design yielded a library of 12,800 molecules, which was docked with our program FITTED to a representative crystal structure of NSP3. We ranked the predicted binding poses based on docking score, followed by visual pose analysis of the best 200 compounds. We finally selected and proposed 150 compounds for testing, followed by further shortlisting to yield a final list of 107 molecules. 91 compounds were purchased from Enamine and are being tested at the Structural Genomics Consortium (SGC). Our approach and findings will further contribute to our open science efforts, and we aim to continue to engage the scientific community

    The First CACHE Challenge – Testing Diverse Virtual Screening Scoring Methods to Identify Potential LRRK2 Binders

    No full text
    In December 2021, Molecular Forecaster (MFI) applied to participate in the inaugural CACHE Challenge. Organized by the Structural Genomics Consortium (SGC), CACHE (Critical Assessment of Computational Hit-finding Experiments) is a public–private partnership benchmarking initiative to enable the development of computational methods “to compare and improve small-molecule hit-finding algorithms through cycles of prediction and experimental testing.” The MFI team has decided to take multiple research-focused approaches to our predictions in this first CACHE challenge, aiming to learn from our successes and failures. We are putting MFI’s team, expertise, and algorithms to the test, using them as a foundation to push the boundaries beyond our scientific and application successes to-date. We’ve also decided to double-down and share the details of our work with the community. The experimental results are now in and we have conducted a retrospective analysis in the second half of this manuscript
    corecore