122 research outputs found

    Kernel-based machine learning for molecular crystal structure prediction

    Get PDF

    Data-efficient machine learning for molecular crystal structure prediction

    Get PDF
    The combination of modern machine learning (ML) approaches with high-quality data from quantum mechanical (QM) calculations can yield models with an unrivalled accuracy/cost ratio. However, such methods are ultimately limited by the computational effort required to produce the reference data. In particular, reference calculations for periodic systems with many atoms can become prohibitively expensive for higher levels of theory. This trade-off is critical in the context of organic crystal structure prediction (CSP). Here, a data-efficient ML approach would be highly desirable, since screening a huge space of possible polymorphs in a narrow energy range requires the assessment of a large number of trial structures with high accuracy. In this contribution, we present tailored Δ-ML models that allow screening a wide range of crystal candidates while adequately describing the subtle interplay between intermolecular interactions such as H-bonding and many-body dispersion effects. This is achieved by enhancing a physics-based description of long-range interactions at the density functional tight binding (DFTB) level—for which an efficient implementation is available—with a short-range ML model trained on high-quality first-principles reference data. The presented workflow is broadly applicable to different molecular materials, without the need for a single periodic calculation at the reference level of theory. We show that this even allows the use of wavefunction methods in CSP

    Mapping Materials and Molecules

    Get PDF
    The visualization of data is indispensable in scientific research, from the early stages when human insight forms to the final step of communicating results. In computational physics, chemistry and materials science, it can be as simple as making a scatter plot or as straightforward as looking through the snapshots of atomic positions manually. However, as a result of the “big data” revolution, these conventional approaches are often inadequate. The widespread adoption of high-throughput computation for materials discovery and the associated community-wide repositories have given rise to data sets that contain an enormous number of compounds and atomic configurations. A typical data set contains thousands to millions of atomic structures, along with a diverse range of properties such as formation energies, band gaps, or bioactivities. It would thus be desirable to have a data-driven and automated framework for visualizing and analyzing such structural data sets. The key idea is to construct a low-dimensional representation of the data, which facilitates navigation, reveals underlying patterns, and helps to identify data points with unusual attributes. Such data-intensive maps, often employing machine learning methods, are appearing more and more frequently in the literature. However, to the wider community, it is not always transparent how these maps are made and how they should be interpreted. Furthermore, while these maps undoubtedly serve a decorative purpose in academic publications, it is not always apparent what extra information can be garnered from reading or making them. This Account attempts to answer such questions. We start with a concise summary of the theory of representing chemical environments, followed by the introduction of a simple yet practical conceptual approach for generating structure maps in a generic and automated manner. Such analysis and mapping is made nearly effortless by employing the newly developed software tool ASAP. To showcase the applicability to a wide variety of systems in chemistry and materials science, we provide several illustrative examples, including crystalline and amorphous materials, interfaces, and organic molecules. In these examples, the maps not only help to sift through large data sets but also reveal hidden patterns that could be easily missed using conventional analyses. The explosion in the amount of computed information in chemistry and materials science has made visualization into a science in itself. Not only have we benefited from exploiting these visualization methods in previous works, we also believe that the automated mapping of data sets will in turn stimulate further creativity and exploration, as well as ultimately feed back into future advances in the respective fields

    Mapping Materials and Molecules.

    Get PDF
    The visualization of data is indispensable in scientific research, from the early stages when human insight forms to the final step of communicating results. In computational physics, chemistry and materials science, it can be as simple as making a scatter plot or as straightforward as looking through the snapshots of atomic positions manually. However, as a result of the "big data" revolution, these conventional approaches are often inadequate. The widespread adoption of high-throughput computation for materials discovery and the associated community-wide repositories have given rise to data sets that contain an enormous number of compounds and atomic configurations. A typical data set contains thousands to millions of atomic structures, along with a diverse range of properties such as formation energies, band gaps, or bioactivities.It would thus be desirable to have a data-driven and automated framework for visualizing and analyzing such structural data sets. The key idea is to construct a low-dimensional representation of the data, which facilitates navigation, reveals underlying patterns, and helps to identify data points with unusual attributes. Such data-intensive maps, often employing machine learning methods, are appearing more and more frequently in the literature. However, to the wider community, it is not always transparent how these maps are made and how they should be interpreted. Furthermore, while these maps undoubtedly serve a decorative purpose in academic publications, it is not always apparent what extra information can be garnered from reading or making them.This Account attempts to answer such questions. We start with a concise summary of the theory of representing chemical environments, followed by the introduction of a simple yet practical conceptual approach for generating structure maps in a generic and automated manner. Such analysis and mapping is made nearly effortless by employing the newly developed software tool ASAP. To showcase the applicability to a wide variety of systems in chemistry and materials science, we provide several illustrative examples, including crystalline and amorphous materials, interfaces, and organic molecules. In these examples, the maps not only help to sift through large data sets but also reveal hidden patterns that could be easily missed using conventional analyses.The explosion in the amount of computed information in chemistry and materials science has made visualization into a science in itself. Not only have we benefited from exploiting these visualization methods in previous works, we also believe that the automated mapping of data sets will in turn stimulate further creativity and exploration, as well as ultimately feed back into future advances in the respective fields

    Biallelic inherited SCN8A variants, a rare cause of SCN8A‐related developmental and epileptic encephalopathy

    Full text link
    ObjectiveMonoallelic de novo gain‐of‐function variants in the voltage‐gated sodium channel SCN8A are one of the recurrent causes of severe developmental and epileptic encephalopathy (DEE). In addition, a small number of de novo or inherited monoallelic loss‐of‐function variants have been found in patients with intellectual disability, autism spectrum disorder, or movement disorders. Inherited monoallelic variants causing either gain or loss‐of‐function are also associated with less severe conditions such as benign familial infantile seizures and isolated movement disorders. In all three categories, the affected individuals are heterozygous for a SCN8A variant in combination with a wild‐type allele. In the present study, we describe two unusual families with severely affected individuals who inherited biallelic variants of SCN8A.MethodsWe identified two families with biallelic SCN8A variants by diagnostic gene panel sequencing. Functional analysis of the variants was performed using voltage clamp recordings from transfected ND7/23 cells.ResultsWe identified three probands from two unrelated families with DEE due to biallelic SCN8A variants. Each parent of an affected individual carried a single heterozygous SCN8A variant and exhibited mild cognitive impairment without seizures. In both families, functional analysis demonstrated segregation of one allele with complete loss‐of‐function, and one allele with altered biophysical properties consistent with partial loss‐of‐function.SignificanceThese studies demonstrate that SCN8A DEE may, in rare cases, result from inheritance of two variants, both of which exhibit reduced channel activity. In these families, heterozygosity for the dominant variants results in less severe disease than biallelic inheritance of two variant alleles. The clinical consequences of variants with partial and complete loss of SCN8A function are variable and likely to be influenced by genetic background.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153117/1/epi16371_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/153117/2/epi16371.pd

    Vascular tissue specific mirna profiles reveal novel correlations with risk factors in coronary artery disease

    Get PDF
    Funding Information: Acknowledgments: We wish to thank all individuals donating cardiovascular relevant tissue and data. We would like to thank the surgeons of the Department of Cardiovascular Surgery and the KaBi-DHM (Cardiovascular Biobank of the German Heart Center) for collecting the surgical specimens. We further wish to thank the German Centre for Cardiovascular Research (DZHK) for financial support, the technical assistance team (Nicole Beck, Ulrike Weiß and Susanne Blachut) for wet lab and sequencing support. M.v.S. reported support by the Clinician Scientist Excellence Program of the DZHK, the German Society of Cardiology (DGK), the German Heart Foundation (Deutsche Herzstiftung e.V.), the Fondation Leducq (PlaqOmics) and the Corona Foundation (Junior Research Group Cardiovascular Diseases). Further, support was provided within the framework of DigiMed Bayern (www.digimed-bayern.de) funded by the Bavarian State Ministry of Health and Care and the Bavarian State Ministry of Science and the Arts through the DHM-MSRM Joint Research Center. Figures were prepared based on a BioRender’s Academic License using BioRender https://biorender.com/. Funding Information: Funding: Supported by the German Centre for Cardiovascular Research (DZHK), grant number 81X2100144 and by the BMBF (German Ministry of Education and Research). Publisher Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland.Cardiovascular disease (CVD) is the leading cause of morbidity and mortality worldwide. Non-coding RNAs have already been linked to CVD development and progression. While microR-NAs (miRs) have been well studied in blood samples, there is little data on tissue-specific miRs in cardiovascular relevant tissues and their relation to cardiovascular risk factors. Tissue-specific miRs derived from Arteria mammaria interna (IMA) from 192 coronary artery disease (CAD) patients undergoing coronary artery bypass grafting (CABG) were analyzed. The aims of the study were 1) to establish a reference atlas which can be utilized for identification of novel diagnostic biomarkers and potential therapeutic targets, and 2) to relate these miRs to cardiovascular risk factors. Overall, 393 individual miRs showed sufficient expression levels and passed quality control for further analysis. We identified 17 miRs–miR-10b-3p, miR-10-5p, miR-17-3p, miR-21-5p, miR-151a-5p, miR-181a-5p, miR-185-5p, miR-194-5p, miR-199a-3p, miR-199b-3p, miR-212-3p, miR-363-3p, miR-548d-5p, miR-744-5p, miR-3117-3p, miR-5683 and miR-5701–significantly correlated with cardiovascular risk factors (correlation coefficient >0.2 in both directions, p-value (p < 0.006, false discovery rate (FDR) <0.05). Of particular interest, miR-5701 was positively correlated with hypertension, hypercholesterolemia, and diabetes. In addition, we found that miR-629-5p and miR-98-5p were significantly correlated with acute myocardial infarction. We provide a first atlas of miR profiles in IMA samples from CAD patients. In perspective, these miRs might play an important role in improved risk assessment, mechanistic disease understanding and local therapy of CAD.Peer reviewe

    Nonlocal Automated Comparative Static Analysis

    Get PDF
    This paper reviews work on the development of a program Nasa for the automated comparative static analysis of parameterized nonlinear systems over parameter intervals. Nasa incorporates a fast and efficient algorithm Feed for the automatic evaluation of higher-order partial derivatives, as well as an adaptive homotopy continuation algorithm for obtaining all required initial conditions. Applications are envisioned for fields such as economics where models tend to be complex and closed-form solutions are difficult to obtain

    Development and evaluation of machine learning in whole-body magnetic resonance imaging for detecting metastases in patients with lung or colon cancer: a diagnostic test accuracy study.

    Get PDF
    OBJECTIVES: Whole-body magnetic resonance imaging (WB-MRI) has been demonstrated to be efficient and cost-effective for cancer staging. The study aim was to develop a machine learning (ML) algorithm to improve radiologists' sensitivity and specificity for metastasis detection and reduce reading times. MATERIALS AND METHODS: A retrospective analysis of 438 prospectively collected WB-MRI scans from multicenter Streamline studies (February 2013-September 2016) was undertaken. Disease sites were manually labeled using Streamline reference standard. Whole-body MRI scans were randomly allocated to training and testing sets. A model for malignant lesion detection was developed based on convolutional neural networks and a 2-stage training strategy. The final algorithm generated lesion probability heat maps. Using a concurrent reader paradigm, 25 radiologists (18 experienced, 7 inexperienced in WB-/MRI) were randomly allocated WB-MRI scans with or without ML support to detect malignant lesions over 2 or 3 reading rounds. Reads were undertaken in the setting of a diagnostic radiology reading room between November 2019 and March 2020. Reading times were recorded by a scribe. Prespecified analysis included sensitivity, specificity, interobserver agreement, and reading time of radiology readers to detect metastases with or without ML support. Reader performance for detection of the primary tumor was also evaluated. RESULTS: Four hundred thirty-three evaluable WB-MRI scans were allocated to algorithm training (245) or radiology testing (50 patients with metastases, from primary 117 colon [n = 117] or lung [n = 71] cancer). Among a total 562 reads by experienced radiologists over 2 reading rounds, per-patient specificity was 86.2% (ML) and 87.7% (non-ML) (-1.5% difference; 95% confidence interval [CI], -6.4%, 3.5%; P = 0.39). Sensitivity was 66.0% (ML) and 70.0% (non-ML) (-4.0% difference; 95% CI, -13.5%, 5.5%; P = 0.344). Among 161 reads by inexperienced readers, per-patient specificity in both groups was 76.3% (0% difference; 95% CI, -15.0%, 15.0%; P = 0.613), with sensitivity of 73.3% (ML) and 60.0% (non-ML) (13.3% difference; 95% CI, -7.9%, 34.5%; P = 0.313). Per-site specificity was high (>90%) for all metastatic sites and experience levels. There was high sensitivity for the detection of primary tumors (lung cancer detection rate of 98.6% with and without ML [0.0% difference; 95% CI, -2.0%, 2.0%; P = 1.00], colon cancer detection rate of 89.0% with and 90.6% without ML [-1.7% difference; 95% CI, -5.6%, 2.2%; P = 0.65]). When combining all reads from rounds 1 and 2, reading times fell by 6.2% (95% CI, -22.8%, 10.0%) when using ML. Round 2 read-times fell by 32% (95% CI, 20.8%, 42.8%) compared with round 1. Within round 2, there was a significant decrease in read-time when using ML support, estimated as 286 seconds (or 11%) quicker (P = 0.0281), using regression analysis to account for reader experience, read round, and tumor type. Interobserver variance suggests moderate agreement, Cohen κ = 0.64; 95% CI, 0.47, 0.81 (with ML), and Cohen κ = 0.66; 95% CI, 0.47, 0.81 (without ML). CONCLUSIONS: There was no evidence of a significant difference in per-patient sensitivity and specificity for detecting metastases or the primary tumor using concurrent ML compared with standard WB-MRI. Radiology read-times with or without ML support fell for round 2 reads compared with round 1, suggesting that readers familiarized themselves with the study reading method. During the second reading round, there was a significant reduction in reading time when using ML support
    corecore