25 research outputs found
Coarse-Graining with Equivariant Neural Networks: A Path Towards Accurate and Data-Efficient Models
Machine learning has recently entered into the mainstream of coarse-grained
(CG) molecular modeling and simulation. While a variety of methods for
incorporating deep learning into these models exist, many of them involve
training neural networks to act directly as the CG force field. This has
several benefits, the most significant of which is accuracy. Neural networks
can inherently incorporate multi-body effects during the calculation of CG
forces, and a well-trained neural network force field outperforms pairwise
basis sets generated from essentially any methodology. However, this comes at a
significant cost. First, these models are typically slower than pairwise force
fields even when accounting for specialized hardware which accelerates the
training and integration of such networks. The second, and the focus of this
paper, is the need for the considerable amount of data needed to train such
force fields. It is common to use tens of microseconds of molecular dynamics
data to train a single CG model, which approaches the point of eliminating the
CG models usefulness in the first place. As we investigate in this work, it is
apparent that this data-hunger trap from neural networks for predicting
molecular energies and forces is caused in large part by the difficulty in
learning force equivariance, i.e., the fact that force vectors should rotate
while maintaining their magnitude in response to an equivalent rotation of the
system. We demonstrate that for CG water, networks that inherently incorporate
this equivariance into their embedding can produce functional models using
datasets as small as a single frame of reference data, which networks without
inherent symmetry equivariance cannot
Utilizing Machine Learning to Greatly Expand the Range and Accuracy of Bottom-Up Coarse-Grained Models Through Virtual Particles
Coarse-grained (CG) models parameterized using atomistic reference data,
i.e., 'bottom up' CG models, have proven useful in the study of biomolecules
and other soft matter. However, the construction of highly accurate, low
resolution CG models of biomolecules remains challenging. We demonstrate in
this work how virtual particles, CG sites with no atomistic correspondence, can
be incorporated into CG models within the context of relative entropy
minimization (REM) as latent variables. The methodology presented, variational
derivative relative entropy minimization (VD-REM), enables optimization of
virtual particle interactions through a gradient descent algorithm aided by
machine learning. We apply this methodology to the challenging case of a
solvent-free CG model of a 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC)
lipid bilayer and demonstrate that introduction of virtual particles captures
solvent-mediated behavior and higher-order correlations which REM alone cannot
capture in a more standard CG model based only on the mapping of collections of
atoms to the CG sites.Comment: 35 pages, 9 figure
OpenMSCG: A Software Tool for Bottom-Up Coarse-Graining
The âbottom-upâ approach to coarse-graining, for building accurate and efficient computational models to simulate large-scale and complex phenomena and processes, is an important approach in computational chemistry, biophysics, and materials science. As one example, the Multiscale Coarse-Graining (MS-CG) approach to developing CG models can be rigorously derived using statistical mechanics applied to fine-grained, i.e., all-atom simulation data for a given system. Under a number of circumstances, a systematic procedure, such as MS-CG modeling, is particularly valuable. Here, we present the development of the OpenMSCG software, a modularized open-source software that provides a collection of successful and widely applied bottom-up CG methods, including Boltzmann Inversion (BI), Force-Matching (FM), Ultra-Coarse-Graining (UCG), Relative Entropy Minimization (REM), Essential Dynamics Coarse-Graining (EDCG), and Heterogeneous Elastic Network Modeling (HeteroENM). OpenMSCG is a high-performance and comprehensive toolset that can be used to derive CG models from large-scale fine-grained simulation data in file formats from common molecular dynamics (MD) software packages, such as GROMACS, LAMMPS, and NAMD. OpenMSCG is modularized in the Python programming framework, which allows users to create and customize modeling ârecipesâ for reproducible results, thus greatly improving the reliability, reproducibility, and sharing of bottom-up CG models and their applications
Detectors for the James Webb Space Telescope Near-Infrared Spectrograph I: Readout Mode, Noise Model, and Calibration Considerations
We describe how the James Webb Space Telescope (JWST) Near-Infrared
Spectrograph's (NIRSpec's) detectors will be read out, and present a model of
how noise scales with the number of multiple non-destructive reads
sampling-up-the-ramp. We believe that this noise model, which is validated
using real and simulated test data, is applicable to most astronomical
near-infrared instruments. We describe some non-ideal behaviors that have been
observed in engineering grade NIRSpec detectors, and demonstrate that they are
unlikely to affect NIRSpec sensitivity, operations, or calibration. These
include a HAWAII-2RG reset anomaly and random telegraph noise (RTN). Using real
test data, we show that the reset anomaly is: (1) very nearly noiseless and (2)
can be easily calibrated out. Likewise, we show that large-amplitude RTN
affects only a small and fixed population of pixels. It can therefore be
tracked using standard pixel operability maps.Comment: 55 pages, 10 figure
Nanopore native RNA sequencing of a human poly(A) transcriptome
High-throughput complementary DNA sequencing technologies have advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and modifications are not retained. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies. Our study generated 9.9 million aligned sequence reads for the human cell line GM12878, using thirty MinION flow cells at six institutions. These native RNA reads had a median length of 771 bases, and a maximum aligned length of over 21,000 bases. Mitochondrial poly(A) reads provided an internal measure of read-length quality. We combined these long nanopore reads with higher accuracy short-reads and annotated GM12878 promoter regions to identify 33,984 plausible RNA isoforms. We describe strategies for assessing 3âČ poly(A) tail length, base modifications and transcript haplotypes
Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity.
Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant
Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation
Overview of the MOSAiC expedition: Physical oceanography
Arctic Ocean properties and processes are highly relevant to the regional and global coupled climate system,
yet still scarcely observed, especially in winter. Team OCEAN conducted a full year of physical oceanography
observations as part of the Multidisciplinary drifting Observatory for the Study of the Arctic Climate
(MOSAiC), a drift with the Arctic sea ice from October 2019 to September 2020. An international team
designed and implemented the program to characterize the Arctic Ocean system in unprecedented detail, from
the seafloor to the air-sea ice-ocean interface, from sub-mesoscales to pan-Arctic. The oceanographic
measurements were coordinated with the other teams to explore the ocean physics and linkages to the
climate and ecosystem. This paper introduces the major components of the physical oceanography program
and complements the other team overviews of the MOSAiC observational program. Team OCEANâs sampling
strategy was designed around hydrographic ship-, ice- and autonomous platform-based measurements to
improve the understanding of regional circulation and mixing processes. Measurements were carried out
both routinely, with a regular schedule, and in response to storms or opening leads. Here we present alongdrift time series of hydrographic properties, allowing insights into the seasonal and regional evolution of the
water column from winter in the Laptev Sea to early summer in Fram Strait: freshening of the surface,
deepening of the mixed layer, increase in temperature and salinity of the Atlantic Water. We also highlight
the presence of Canada Basin deep water intrusions and a surface meltwater layer in leads. MOSAiC most
likely was the most comprehensive program ever conducted over the ice-covered Arctic Ocean. While data
analysis and interpretation are ongoing, the acquired datasets will support a wide range of physical
oceanography and multi-disciplinary research. They will provide a significant foundation for assessing and
advancing modeling capabilities in the Arctic Ocean
Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity
Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant
Changes in symptomatology, reinfection, and transmissibility associated with the SARS-CoV-2 variant B.1.1.7: an ecological study
Background
The SARS-CoV-2 variant B.1.1.7 was first identified in December, 2020, in England. We aimed to investigate whether increases in the proportion of infections with this variant are associated with differences in symptoms or disease course, reinfection rates, or transmissibility.
Methods
We did an ecological study to examine the association between the regional proportion of infections with the SARS-CoV-2 B.1.1.7 variant and reported symptoms, disease course, rates of reinfection, and transmissibility. Data on types and duration of symptoms were obtained from longitudinal reports from users of the COVID Symptom Study app who reported a positive test for COVID-19 between Sept 28 and Dec 27, 2020 (during which the prevalence of B.1.1.7 increased most notably in parts of the UK). From this dataset, we also estimated the frequency of possible reinfection, defined as the presence of two reported positive tests separated by more than 90 days with a period of reporting no symptoms for more than 7 days before the second positive test. The proportion of SARS-CoV-2 infections with the B.1.1.7 variant across the UK was estimated with use of genomic data from the COVID-19 Genomics UK Consortium and data from Public Health England on spike-gene target failure (a non-specific indicator of the B.1.1.7 variant) in community cases in England. We used linear regression to examine the association between reported symptoms and proportion of B.1.1.7. We assessed the Spearman correlation between the proportion of B.1.1.7 cases and number of reinfections over time, and between the number of positive tests and reinfections. We estimated incidence for B.1.1.7 and previous variants, and compared the effective reproduction number, Rt, for the two incidence estimates.
Findings
From Sept 28 to Dec 27, 2020, positive COVID-19 tests were reported by 36â920 COVID Symptom Study app users whose region was known and who reported as healthy on app sign-up. We found no changes in reported symptoms or disease duration associated with B.1.1.7. For the same period, possible reinfections were identified in 249 (0·7% [95% CI 0·6â0·8]) of 36â509 app users who reported a positive swab test before Oct 1, 2020, but there was no evidence that the frequency of reinfections was higher for the B.1.1.7 variant than for pre-existing variants. Reinfection occurrences were more positively correlated with the overall regional rise in cases (Spearman correlation 0·56â0·69 for South East, London, and East of England) than with the regional increase in the proportion of infections with the B.1.1.7 variant (Spearman correlation 0·38â0·56 in the same regions), suggesting B.1.1.7 does not substantially alter the risk of reinfection. We found a multiplicative increase in the Rt of B.1.1.7 by a factor of 1·35 (95% CI 1·02â1·69) relative to pre-existing variants. However, Rt fell below 1 during regional and national lockdowns, even in regions with high proportions of infections with the B.1.1.7 variant.
Interpretation
The lack of change in symptoms identified in this study indicates that existing testing and surveillance infrastructure do not need to change specifically for the B.1.1.7 variant. In addition, given that there was no apparent increase in the reinfection rate, vaccines are likely to remain effective against the B.1.1.7 variant.
Funding
Zoe Global, Department of Health (UK), Wellcome Trust, Engineering and Physical Sciences Research Council (UK), National Institute for Health Research (UK), Medical Research Council (UK), Alzheimer's Society