74 research outputs found
Towards Automated Benchmarking of Atomistic Forcefields: Neat Liquid Densities and Static Dielectric Constants from the ThermoML Data Archive
Atomistic molecular simulations are a powerful way to make quantitative
predictions, but the accuracy of these predictions depends entirely on the
quality of the forcefield employed. While experimental measurements of
fundamental physical properties offer a straightforward approach for evaluating
forcefield quality, the bulk of this information has been tied up in formats
that are not machine-readable. Compiling benchmark datasets of physical
properties from non-machine-readable sources require substantial human effort
and is prone to accumulation of human errors, hindering the development of
reproducible benchmarks of forcefield accuracy. Here, we examine the
feasibility of benchmarking atomistic forcefields against the NIST ThermoML
data archive of physicochemical measurements, which aggregates thousands of
experimental measurements in a portable, machine-readable, self-annotating
format. As a proof of concept, we present a detailed benchmark of the
generalized Amber small molecule forcefield (GAFF) using the AM1-BCC charge
model against measurements (specifically bulk liquid densities and static
dielectric constants at ambient pressure) automatically extracted from the
archive, and discuss the extent of available data. The results of this
benchmark highlight a general problem with fixed-charge forcefields in the
representation low dielectric environments such as those seen in binding
cavities or biological membranes
Development and validation of an inventory to assess conflict in sport teams: the Group Conflict Questionnaire
The dynamic conformational landscape of the protein methyltransferase SETD8
Elucidating the conformational heterogeneity of proteins is essential for understanding
protein function and developing exogenous ligands. With the rapid development of experimental
and computational methods, it is of great interest to integrate these approaches to illuminate the
conformational landscapes of target proteins. SETD8 is a protein lysine methyltransferase (PKMT),
which functions in vivo via the methylation of histone and nonhistone targets. Utilizing covalent
inhibitors and depleting native ligands to trap hidden conformational states, we obtained diverse
X-ray structures of SETD8. These structures were used to seed distributed atomistic molecular
dynamics simulations that generated a total of six milliseconds of trajectory data. Markov state
models, built via an automated machine learning approach and corroborated experimentally, reveal
how slow conformational motions and conformational states are relevant to catalysis. These
findings provide molecular insight on enzymatic catalysis and allosteric mechanisms of a PKMT via
its detailed conformational landscape
Pulsed direct and constant direct currents in the pilocarpine iontophoresis sweat chloride test
Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale.
The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences-from a single sequence to an entire superfamily-and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics-such as Markov state models (MSMs)-which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase family, using all available kinase catalytic domain structures from any organism as structural templates. Ensembler is free and open source software licensed under the GNU General Public License (GPL) v2. It is compatible with Linux and OS X. The latest release can be installed via the conda package manager, and the latest source can be downloaded from https://github.com/choderalab/ensembler
Data from: Ensembler: enabling high-throughput molecular simulations at the superfamily scale
The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences—from a single sequence to an entire superfamily—and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics—such as Markov state models (MSMs)—which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase family, using all available kinase catalytic domain structures from any organism as structural templates. Ensembler is free and open source software licensed under the GNU General Public License (GPL) v2. It is compatible with Linux and OS X. The latest release can be installed via the conda package manager, and the latest source can be downloaded from https://github.com/choderalab/ensembler
- …