64 research outputs found
14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon
Chemistry and materials science are complex. Recently, there have been great
successes in addressing this complexity using data-driven or computational
techniques. Yet, the necessity of input structured in very specific forms and
the fact that there is an ever-growing number of tools creates usability and
accessibility challenges. Coupled with the reality that much data in these
disciplines is unstructured, the effectiveness of these tools is limited.
Motivated by recent works that indicated that large language models (LLMs)
might help address some of these issues, we organized a hackathon event on the
applications of LLMs in chemistry, materials science, and beyond. This article
chronicles the projects built as part of this hackathon. Participants employed
LLMs for various applications, including predicting properties of molecules and
materials, designing novel interfaces for tools, extracting knowledge from
unstructured data, and developing new educational applications.
The diverse topics and the fact that working prototypes could be generated in
less than two days highlight that LLMs will profoundly impact the future of our
fields. The rich collection of ideas and projects also indicates that the
applications of LLMs are not limited to materials science and chemistry but
offer potential benefits to a wide range of scientific disciplines
Theory and application of medium to high throughput prediction method techniques for asymmetric catalyst design
With the use of computational methods in the field of drug design becoming ever more prevalent, there is pressure to port these technologies to other fields. One of the fields ripe for application of computational drug design techniques; specifically virtual screening and computer-aided molecular design, is the design and synthesis of asymmetric catalysts. Such methods could either guide the selection of the optimal catalyst(s) for a given reaction and a given substrate or provide an enriched selection of highly efficient asymmetric catalysts which enable the synthetic chemists to focus on the most promising candidates. This would in turn provide savings in time and reduce the costs associated with the synthesis and evaluation of large libraries of molecules. However, to be applicable to the evaluation of a large number of potential catalysts, speed is of utmost importance. This impetus has led to the development of medium to high throughput virtual screening (HTVS) methods for asymmetric catalyst development or assessment, although a very few applications have been reported. These methods typically fall into four classes: methods combining quantum mechanics and molecular mechanics (QM/MM), pure molecular mechanics-based methods \u2013 a class which can be subdivided into static and dynamic transition state modeling \u2013 and lastly quantitative structure selectivity relationship methods (QSSR). This review will cover specific methods within these classes and their application to selected reactions.Peer reviewed: YesNRC publication: Ye
Single-Point Mutation with a Rotamer Library Toolkit: Toward Protein Engineering
Protein engineers
have long been hard at work to harness biocatalysts
as a natural source of regio-, stereo-, and chemoselectivity in order
to carry out chemistry (reactions and/or substrates) not previously
achieved with these enzymes. The extreme labor demands and exponential
number of mutation combinations have induced computational advances
in this domain. The first step in our virtual approach is to predict
the correct conformations upon mutation of residues (i.e., rebuilding
side chains). For this purpose, we opted for a combination of molecular
mechanics and statistical data. In this work, we have developed automated
computational tools to extract protein structural information and
created conformational libraries for each amino acid dependent on
a variable number of parameters (e.g., resolution, flexibility, secondary
structure). We have also developed the necessary tool to apply the
mutation and optimize the conformation accordingly. For side-chain
conformation prediction, we obtained overall average root-mean-square
deviations (RMSDs) of 0.91 and 1.01 Å for the 18 flexible natural
amino acids within two distinct sets of over 3000 and 1500 side-chain
residues, respectively. The commonly used dihedral angle differences
were also evaluated and performed worse than the state of the art.
These two metrics are also compared. Furthermore, we generated a family-specific
library for kinases that produced an average 2% lower RMSD upon side-chain
reconstruction and a residue-specific library that yielded a 17% improvement.
Ultimately, since our protein engineering outlook involves using our
docking software, Fitted/Impacts, we applied our
mutation protocol to a benchmarked data set for self- and cross-docking.
Our side-chain reconstruction does not hinder our docking software,
demonstrating differences in pose prediction accuracy of approximately
2% (RMSD cutoff metric) for a set of over 200 protein/ligand structures.
Similarly, when docking to a set of over 100 kinases, side-chain reconstruction
(using both general and biased conformation libraries) had minimal
detriment to the docking accuracy
Design and Synthesis of Matrix Metalloproteinase Inhibitors Guided by Molecular Modeling. Picking the S 1
Customizable Generation of Synthetically Accessible, Local Chemical Subspaces
Screening
large libraries of chemicals has been an efficient strategy
to discover bioactive compounds; however a portion of the potential
for success is limited to the available libraries. Synergizing combinatorial
and computational chemistries has emerged as a time-efficient strategy
to explore the chemical space more widely. Ideally, streamlining the
evaluation process for larger, feasible chemical libraries would become
commonplace. Thus, combinatorial tools and, for example, docking methods
would be integrated to identify novel bioactive entities. The idea
is simple in nature, but much more complex in practice; combinatorial
chemistry is more than the coupling of chemicals into products: synthetic
feasibility includes chemoselectivity, stereoselectivity, protecting
group chemistry, and chemical availability which must all be considered
for combinatorial library design. In addition, intuitive interfaces
and simple user manipulation is key for optimal use of such tools
by organic chemistscrucial for the integration of such software
in medicinal chemistry laboratories. We present herein Finders and React2Dintegrated into the Virtual Chemist platform, a modular software suite. This approach
enhances virtual combinatorial chemistry by identifying available
chemicals compatible with a user-defined chemical transformation and
by carrying out the reaction leading to libraries of realistic, synthetically
accessible chemicalsall with a completely automated, black-box,
and efficient design. We demonstrate its utility by generating ∼40
million synthetically accessible, stereochemically accurate compounds
from a single library of 100 000 purchasable molecules and
56 well-characterized chemical reactions
The Third CACHE Challenge – Finding Ligands Targeting the Macrodomain of SARS-CoV-2 NSP3 Using AI-inspired and Knowledge-Based Approaches.
The SARS-CoV-2 virus contains a host of nonstructural proteins (NSPs) that contribute to its structure and viral function. Among them is the nonstructural protein 3 (NSP3), which contains a macrodomain (Mac1) that interferes with antiviral adenosine diphosphate (ADP)-ribosylation signaling. Catalytic mutations in Mac1 render viruses nonpathogenic, making this enzyme a promising target for antiviral development. For this reason, the third CACHE challenge focused on identifying binders of the Mac1 domain of NSP3 for the development of novel antivirals against SARS-CoV-2. To this end, we used available structural data of the NSP3 Mac1 domain in complex with known fragment binders as starting points for ligand discovery; our efforts were primarily focused on sub-sites of the ADP binding site in the NSP3 macrodomain. Then, using Artificial intelligence (AI)-guided and knowledge-based fragment merging and expansion approaches, we generated novel molecules that would serve as templates to identify highly similar compounds in the Enamine REAL database that would be commercially available. Our design yielded a library of 12,800 molecules, which was docked with our program FITTED to a representative crystal structure of NSP3. We ranked the predicted binding poses based on docking score, followed by visual pose analysis of the best 200 compounds. We finally selected and proposed 150 compounds for testing, followed by further shortlisting to yield a final list of 107 molecules. 91 compounds were purchased from Enamine and are being tested at the Structural Genomics Consortium (SGC). Our approach and findings will further contribute to our open science efforts, and we aim to continue to engage the scientific community
The First CACHE Challenge – Testing Diverse Virtual Screening Scoring Methods to Identify Potential LRRK2 Binders
In December 2021, Molecular Forecaster (MFI) applied to participate in the inaugural CACHE Challenge. Organized by the Structural Genomics Consortium (SGC), CACHE (Critical Assessment of Computational Hit-finding Experiments) is a public–private partnership benchmarking initiative to enable the development of computational methods “to compare and improve small-molecule hit-finding algorithms through cycles of prediction and experimental testing.”
The MFI team has decided to take multiple research-focused approaches to our predictions in this first CACHE challenge, aiming to learn from our successes and failures. We are putting MFI’s team, expertise, and algorithms to the test, using them as a foundation to push the boundaries beyond our scientific and application successes to-date. We’ve also decided to double-down and share the details of our work with the community. The experimental results are now in and we have conducted a retrospective analysis in the second half of this manuscript
Toward a Computational Tool Predicting the Stereochemical Outcome of Asymmetric Reactions. 1. Application to Sharpless Asymmetric Dihydroxylation
- …