25 research outputs found

    Coarse-Graining with Equivariant Neural Networks: A Path Towards Accurate and Data-Efficient Models

    Full text link
    Machine learning has recently entered into the mainstream of coarse-grained (CG) molecular modeling and simulation. While a variety of methods for incorporating deep learning into these models exist, many of them involve training neural networks to act directly as the CG force field. This has several benefits, the most significant of which is accuracy. Neural networks can inherently incorporate multi-body effects during the calculation of CG forces, and a well-trained neural network force field outperforms pairwise basis sets generated from essentially any methodology. However, this comes at a significant cost. First, these models are typically slower than pairwise force fields even when accounting for specialized hardware which accelerates the training and integration of such networks. The second, and the focus of this paper, is the need for the considerable amount of data needed to train such force fields. It is common to use tens of microseconds of molecular dynamics data to train a single CG model, which approaches the point of eliminating the CG models usefulness in the first place. As we investigate in this work, it is apparent that this data-hunger trap from neural networks for predicting molecular energies and forces is caused in large part by the difficulty in learning force equivariance, i.e., the fact that force vectors should rotate while maintaining their magnitude in response to an equivalent rotation of the system. We demonstrate that for CG water, networks that inherently incorporate this equivariance into their embedding can produce functional models using datasets as small as a single frame of reference data, which networks without inherent symmetry equivariance cannot

    Utilizing Machine Learning to Greatly Expand the Range and Accuracy of Bottom-Up Coarse-Grained Models Through Virtual Particles

    Full text link
    Coarse-grained (CG) models parameterized using atomistic reference data, i.e., 'bottom up' CG models, have proven useful in the study of biomolecules and other soft matter. However, the construction of highly accurate, low resolution CG models of biomolecules remains challenging. We demonstrate in this work how virtual particles, CG sites with no atomistic correspondence, can be incorporated into CG models within the context of relative entropy minimization (REM) as latent variables. The methodology presented, variational derivative relative entropy minimization (VD-REM), enables optimization of virtual particle interactions through a gradient descent algorithm aided by machine learning. We apply this methodology to the challenging case of a solvent-free CG model of a 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC) lipid bilayer and demonstrate that introduction of virtual particles captures solvent-mediated behavior and higher-order correlations which REM alone cannot capture in a more standard CG model based only on the mapping of collections of atoms to the CG sites.Comment: 35 pages, 9 figure

    OpenMSCG: A Software Tool for Bottom-Up Coarse-Graining

    Get PDF
    The “bottom-up” approach to coarse-graining, for building accurate and efficient computational models to simulate large-scale and complex phenomena and processes, is an important approach in computational chemistry, biophysics, and materials science. As one example, the Multiscale Coarse-Graining (MS-CG) approach to developing CG models can be rigorously derived using statistical mechanics applied to fine-grained, i.e., all-atom simulation data for a given system. Under a number of circumstances, a systematic procedure, such as MS-CG modeling, is particularly valuable. Here, we present the development of the OpenMSCG software, a modularized open-source software that provides a collection of successful and widely applied bottom-up CG methods, including Boltzmann Inversion (BI), Force-Matching (FM), Ultra-Coarse-Graining (UCG), Relative Entropy Minimization (REM), Essential Dynamics Coarse-Graining (EDCG), and Heterogeneous Elastic Network Modeling (HeteroENM). OpenMSCG is a high-performance and comprehensive toolset that can be used to derive CG models from large-scale fine-grained simulation data in file formats from common molecular dynamics (MD) software packages, such as GROMACS, LAMMPS, and NAMD. OpenMSCG is modularized in the Python programming framework, which allows users to create and customize modeling “recipes” for reproducible results, thus greatly improving the reliability, reproducibility, and sharing of bottom-up CG models and their applications

    Detectors for the James Webb Space Telescope Near-Infrared Spectrograph I: Readout Mode, Noise Model, and Calibration Considerations

    Full text link
    We describe how the James Webb Space Telescope (JWST) Near-Infrared Spectrograph's (NIRSpec's) detectors will be read out, and present a model of how noise scales with the number of multiple non-destructive reads sampling-up-the-ramp. We believe that this noise model, which is validated using real and simulated test data, is applicable to most astronomical near-infrared instruments. We describe some non-ideal behaviors that have been observed in engineering grade NIRSpec detectors, and demonstrate that they are unlikely to affect NIRSpec sensitivity, operations, or calibration. These include a HAWAII-2RG reset anomaly and random telegraph noise (RTN). Using real test data, we show that the reset anomaly is: (1) very nearly noiseless and (2) can be easily calibrated out. Likewise, we show that large-amplitude RTN affects only a small and fixed population of pixels. It can therefore be tracked using standard pixel operability maps.Comment: 55 pages, 10 figure

    Nanopore native RNA sequencing of a human poly(A) transcriptome

    Get PDF
    High-throughput complementary DNA sequencing technologies have advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and modifications are not retained. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies. Our study generated 9.9 million aligned sequence reads for the human cell line GM12878, using thirty MinION flow cells at six institutions. These native RNA reads had a median length of 771 bases, and a maximum aligned length of over 21,000 bases. Mitochondrial poly(A) reads provided an internal measure of read-length quality. We combined these long nanopore reads with higher accuracy short-reads and annotated GM12878 promoter regions to identify 33,984 plausible RNA isoforms. We describe strategies for assessing 3â€Č poly(A) tail length, base modifications and transcript haplotypes

    Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity.

    Get PDF
    Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant

    Overview of the MOSAiC expedition: Physical oceanography

    Get PDF
    Arctic Ocean properties and processes are highly relevant to the regional and global coupled climate system, yet still scarcely observed, especially in winter. Team OCEAN conducted a full year of physical oceanography observations as part of the Multidisciplinary drifting Observatory for the Study of the Arctic Climate (MOSAiC), a drift with the Arctic sea ice from October 2019 to September 2020. An international team designed and implemented the program to characterize the Arctic Ocean system in unprecedented detail, from the seafloor to the air-sea ice-ocean interface, from sub-mesoscales to pan-Arctic. The oceanographic measurements were coordinated with the other teams to explore the ocean physics and linkages to the climate and ecosystem. This paper introduces the major components of the physical oceanography program and complements the other team overviews of the MOSAiC observational program. Team OCEAN’s sampling strategy was designed around hydrographic ship-, ice- and autonomous platform-based measurements to improve the understanding of regional circulation and mixing processes. Measurements were carried out both routinely, with a regular schedule, and in response to storms or opening leads. Here we present alongdrift time series of hydrographic properties, allowing insights into the seasonal and regional evolution of the water column from winter in the Laptev Sea to early summer in Fram Strait: freshening of the surface, deepening of the mixed layer, increase in temperature and salinity of the Atlantic Water. We also highlight the presence of Canada Basin deep water intrusions and a surface meltwater layer in leads. MOSAiC most likely was the most comprehensive program ever conducted over the ice-covered Arctic Ocean. While data analysis and interpretation are ongoing, the acquired datasets will support a wide range of physical oceanography and multi-disciplinary research. They will provide a significant foundation for assessing and advancing modeling capabilities in the Arctic Ocean

    Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity

    Get PDF
    Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant

    Changes in symptomatology, reinfection, and transmissibility associated with the SARS-CoV-2 variant B.1.1.7: an ecological study

    Get PDF
    Background The SARS-CoV-2 variant B.1.1.7 was first identified in December, 2020, in England. We aimed to investigate whether increases in the proportion of infections with this variant are associated with differences in symptoms or disease course, reinfection rates, or transmissibility. Methods We did an ecological study to examine the association between the regional proportion of infections with the SARS-CoV-2 B.1.1.7 variant and reported symptoms, disease course, rates of reinfection, and transmissibility. Data on types and duration of symptoms were obtained from longitudinal reports from users of the COVID Symptom Study app who reported a positive test for COVID-19 between Sept 28 and Dec 27, 2020 (during which the prevalence of B.1.1.7 increased most notably in parts of the UK). From this dataset, we also estimated the frequency of possible reinfection, defined as the presence of two reported positive tests separated by more than 90 days with a period of reporting no symptoms for more than 7 days before the second positive test. The proportion of SARS-CoV-2 infections with the B.1.1.7 variant across the UK was estimated with use of genomic data from the COVID-19 Genomics UK Consortium and data from Public Health England on spike-gene target failure (a non-specific indicator of the B.1.1.7 variant) in community cases in England. We used linear regression to examine the association between reported symptoms and proportion of B.1.1.7. We assessed the Spearman correlation between the proportion of B.1.1.7 cases and number of reinfections over time, and between the number of positive tests and reinfections. We estimated incidence for B.1.1.7 and previous variants, and compared the effective reproduction number, Rt, for the two incidence estimates. Findings From Sept 28 to Dec 27, 2020, positive COVID-19 tests were reported by 36 920 COVID Symptom Study app users whose region was known and who reported as healthy on app sign-up. We found no changes in reported symptoms or disease duration associated with B.1.1.7. For the same period, possible reinfections were identified in 249 (0·7% [95% CI 0·6–0·8]) of 36 509 app users who reported a positive swab test before Oct 1, 2020, but there was no evidence that the frequency of reinfections was higher for the B.1.1.7 variant than for pre-existing variants. Reinfection occurrences were more positively correlated with the overall regional rise in cases (Spearman correlation 0·56–0·69 for South East, London, and East of England) than with the regional increase in the proportion of infections with the B.1.1.7 variant (Spearman correlation 0·38–0·56 in the same regions), suggesting B.1.1.7 does not substantially alter the risk of reinfection. We found a multiplicative increase in the Rt of B.1.1.7 by a factor of 1·35 (95% CI 1·02–1·69) relative to pre-existing variants. However, Rt fell below 1 during regional and national lockdowns, even in regions with high proportions of infections with the B.1.1.7 variant. Interpretation The lack of change in symptoms identified in this study indicates that existing testing and surveillance infrastructure do not need to change specifically for the B.1.1.7 variant. In addition, given that there was no apparent increase in the reinfection rate, vaccines are likely to remain effective against the B.1.1.7 variant. Funding Zoe Global, Department of Health (UK), Wellcome Trust, Engineering and Physical Sciences Research Council (UK), National Institute for Health Research (UK), Medical Research Council (UK), Alzheimer's Society
    corecore