1,367 research outputs found
A generative model for protein contact networks
In this paper we present a generative model for protein contact networks. The
soundness of the proposed model is investigated by focusing primarily on
mesoscopic properties elaborated from the spectra of the graph Laplacian. To
complement the analysis, we study also classical topological descriptors, such
as statistics of the shortest paths and the important feature of modularity.
Our experiments show that the proposed model results in a considerable
improvement with respect to two suitably chosen generative mechanisms,
mimicking with better approximation real protein contact networks in terms of
diffusion properties elaborated from the Laplacian spectra. However, as well as
the other considered models, it does not reproduce with sufficient accuracy the
shortest paths structure. To compensate this drawback, we designed a second
step involving a targeted edge reconfiguration process. The ensemble of
reconfigured networks denotes improvements that are statistically significant.
As a byproduct of our study, we demonstrate that modularity, a well-known
property of proteins, does not entirely explain the actual network architecture
characterizing protein contact networks. In fact, we conclude that modularity,
intended as a quantification of an underlying community structure, should be
considered as an emergent property of the structural organization of proteins.
Interestingly, such a property is suitably optimized in protein contact
networks together with the feature of path efficiency.Comment: 18 pages, 67 reference
ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning
Designing de novo proteins beyond those found in nature holds significant
promise for advancements in both scientific and engineering applications.
Current methodologies for protein design often rely on AI-based models, such as
surrogate models that address end-to-end problems by linking protein structure
to material properties or vice versa. However, these models frequently focus on
specific material objectives or structural properties, limiting their
flexibility when incorporating out-of-domain knowledge into the design process
or comprehensive data analysis is required. In this study, we introduce
ProtAgents, a platform for de novo protein design based on Large Language
Models (LLMs), where multiple AI agents with distinct capabilities
collaboratively address complex tasks within a dynamic environment. The
versatility in agent development allows for expertise in diverse domains,
including knowledge retrieval, protein structure analysis, physics-based
simulations, and results analysis. The dynamic collaboration between agents,
empowered by LLMs, provides a versatile approach to tackling protein design and
analysis problems, as demonstrated through diverse examples in this study. The
problems of interest encompass designing new proteins, analyzing protein
structures and obtaining new first-principles data -- natural vibrational
frequencies -- via physics simulations. The concerted effort of the system
allows for powerful automated and synergistic design of de novo proteins with
targeted mechanical properties. The flexibility in designing the agents, on one
hand, and their capacity in autonomous collaboration through the dynamic
LLM-based multi-agent environment on the other hand, unleashes great potentials
of LLMs in addressing multi-objective materials problems and opens up new
avenues for autonomous materials discovery and design
DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces
Coarse-grained molecular models of proteins permit access to length and time
scales unattainable by all-atom models and the simulation of processes that
occur on long-time scales such as aggregation and folding. The reduced
resolution realizes computational accelerations but an atomistic representation
can be vital for a complete understanding of mechanistic details. Backmapping
is the process of restoring all-atom resolution to coarse-grained molecular
models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive
Model for Non-Deterministic Backmapping) as an autoregressive denoising
diffusion probability model to restore all-atom details to coarse-grained
protein representations retaining only C{\alpha} coordinates. The
autoregressive generation process proceeds from the protein N-terminus to
C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace
and previously backmapped backbone and side chain atoms within the local
neighborhood. The local and autoregressive nature of our model makes it
transferable between proteins. The stochastic nature of the denoising diffusion
process means that the model generates a realistic ensemble of backbone and
side chain all-atom configurations consistent with the coarse-grained C{\alpha}
trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB)
and validate it in applications to a hold-out PDB test set,
intrinsically-disordered protein structures from the Protein Ensemble Database
(PED), molecular dynamics simulations of fast-folding mini-proteins from DE
Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art
reconstruction performance in terms of correct bond formation, avoidance of
side chain clashes, and diversity of the generated side chain configurational
states. We make DiAMoNDBack model publicly available as a free and open source
Python package
Coupling streaming AI and HPC ensembles to achieve 100-1000x faster biomolecular simulations
Machine learning (ML)-based steering can improve the performance of
ensemble-based simulations by allowing for online selection of more
scientifically meaningful computations. We present DeepDriveMD, a framework for
ML-driven steering of scientific simulations that we have used to achieve
orders-of-magnitude improvements in molecular dynamics (MD) performance via
effective coupling of ML and HPC on large parallel computers. We discuss the
design of DeepDriveMD and characterize its performance. We demonstrate that
DeepDriveMD can achieve between 100-1000x acceleration for protein folding
simulations relative to other methods, as measured by the amount of simulated
time performed, while covering the same conformational landscape as quantified
by the states sampled during a simulation. Experiments are performed on
leadership-class platforms on up to 1020 nodes. The results establish
DeepDriveMD as a high-performance framework for ML-driven HPC simulation
scenarios, that supports diverse MD simulation and ML back-ends, and which
enables new scientific insights by improving the length and time scales
accessible with current computing capacity
- …