17 research outputs found

    Generating QM1B with PySCFIPU_{\text{IPU}}

    Full text link
    The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples. These datasets are limited in size because the labels are computed using the accurate (but computationally demanding) predictions of Density Functional Theory (DFT). Notably, prior DFT datasets were created using CPU supercomputers without leveraging hardware acceleration. In this paper, we take a first step towards utilising hardware accelerators by introducing the data generator PySCFIPU_{\text{IPU}} using Intelligence Processing Units (IPUs). This allowed us to create the dataset QM1B with one billion training examples containing 9-11 heavy atoms. We demonstrate that a simple baseline neural network (SchNet 9M) improves its performance by simply increasing the amount of training data without additional inductive biases. To encourage future researchers to use QM1B responsibly, we highlight several limitations of QM1B and emphasise the low-resolution of our DFT options, which also serves as motivation for even larger, more accurate datasets. Code and dataset are available on Github: http://github.com/graphcore-research/pyscf-ipuComment: 15 pages, 7 figures. NeurIPS 2023 Track Datasets and Benchmark

    Accessible Data Curation and Analytics for International-Scale Citizen Science Datasets

    Get PDF
    The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. Over 4.7 million participants and 189 million unique assessments have been logged since its introduction in March 2020. The success of the Covid Symptom Study creates technical challenges around effective data curation for two reasons. Firstly, the scale of the dataset means that it can no longer be easily processed using standard software on commodity hardware. Secondly, the size of the research group means that replicability and consistency of key analytics used across multiple publications becomes an issue. We present ExeTera, an open source data curation software designed to address scalability challenges and to enable reproducible research across an international research group for datasets such as the Covid Symptom Study dataset

    Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

    Full text link
    Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks

    Modest effects of dietary supplements during the COVID-19 pandemic: insights from 445 850 users of the COVID-19 Symptom Study app

    Get PDF
    Objectives Dietary supplements may ameliorate SARS-CoV-2 infection, although scientific evidence to support such a role is lacking. We investigated whether users of the COVID-19 Symptom Study app who regularly took dietary supplements were less likely to test positive for SARS-CoV-2 infection.Design App-based community survey.Setting 445 850 subscribers of an app that was launched to enable self-reported information related to SARS-CoV-2 infection for use in the general population in the UK (n=372 720), the USA (n=45 757) and Sweden (n=27 373).Main exposure Self-reported regular dietary supplement usage (constant use during previous 3 months) in the first waves of the pandemic up to 31 July 2020.Main outcome measures SARS-CoV-2 infection confirmed by viral RNA reverse transcriptase PCR test or serology test before 31 July 2020.Results In 372 720 UK participants (175 652 supplement users and 197 068 non-users), those taking probiotics, omega-3 fatty acids, multivitamins or vitamin D had a lower risk of SARS-CoV-2 infection by 14% (95% CI (8% to 19%)), 12% (95% CI (8% to 16%)), 13% (95% CI (10% to 16%)) and 9% (95% CI (6% to 12%)), respectively, after adjusting for potential confounders. No effect was observed for those taking vitamin C, zinc or garlic supplements. On stratification by sex, age and body mass index (BMI), the protective associations in individuals taking probiotics, omega-3 fatty acids, multivitamins and vitamin D were observed in females across all ages and BMI groups, but were not seen in men. The same overall pattern of association was observed in both the US and Swedish cohorts.Conclusion In women, we observed a modest but significant association between use of probiotics, omega-3 fatty acid, multivitamin or vitamin D supplements and lower risk of testing positive for SARS-CoV-2. We found no clear benefits for men nor any effect of vitamin C, garlic or zinc. Randomised controlled trials are required to confirm these observational findings before any therapeutic recommendations can be made

    The Role of MRI Physics in Brain Segmentation CNNs:Achieving Acquisition Invariance and Instructive Uncertainties

    Get PDF
    Being able to adequately process and combine data arising from different sites is crucial in neuroimaging, but is difficult, owing to site, sequence and acquisition-parameter dependent biases. It is important therefore to design algorithms that are not only robust to images of differing contrasts, but also be able to generalise well to unseen ones, with a quantifiable measure of uncertainty. In this paper we demonstrate the efficacy of a physics-informed, uncertainty-aware, segmentation network that employs augmentation-time MR simulations and homogeneous batch feature stratification to achieve acquisition invariance. We show that the proposed approach also accurately extrapolates to out-of-distribution sequence samples, providing well calibrated volumetric bounds on these. We demonstrate a significant improvement in terms of coefficients of variation, backed by uncertainty based volumetric validation.Comment: 10 pages, 3 figures, published in: Simulation and Synthesis in Medical Imaging 6th International Workshop, SASHIMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceeding

    Combining Multimodal Information for Metal Artefact Reduction:An Unsupervised Deep Learning Framework

    Get PDF
    Metal artefact reduction (MAR) techniques aim at removing metal-induced noise from clinical images. In Computed Tomography (CT), supervised deep learning approaches have been shown effective but limited in generalisability, as they mostly rely on synthetic data. In Magnetic Resonance Imaging (MRI) instead, no method has yet been introduced to correct the susceptibility artefact, still present even in MAR-specific acquisitions. In this work, we hypothesise that a multimodal approach to MAR would improve both CT and MRI. Given their different artefact appearance, their complementary information can compensate for the corrupted signal in either modality. We thus propose an unsupervised deep learning method for multimodal MAR. We introduce the use of Locally Normalised Cross Correlation as a loss term to encourage the fusion of multimodal information. Experiments show that our approach favours a smoother correction in the CT, while promoting signal recovery in the MRI.Comment: Accepted at IEEE International Symposium on Biomedical Imaging (ISBI) 202

    A multi-channel uncertainty-aware multi-resolution network for MR to CT synthesis

    Get PDF
    Synthesising computed tomography (CT) images from magnetic resonance images (MRI) plays an important role in the field of medical image analysis, both for quantification and diagnostic purposes. Convolutional neural networks (CNNs) have achieved state-of-the-art results in image-to-image translation for brain applications. However, synthesising whole-body images remains largely uncharted territory, involving many challenges, including large image size and limited field of view, complex spatial context, and anatomical differences between images acquired at different times. We propose the use of an uncertainty-aware multi-channel multi-resolution 3D cascade network specifically aiming for whole-body MR to CT synthesis. The Mean Absolute Error on the synthetic CT generated with the MultiResunc network (73.90 HU) is compared to multiple baseline CNNs like 3D U-Net (92.89 HU), HighRes3DNet (89.05 HU) and deep boosted regression (77.58 HU) and shows superior synthesis performance. We ultimately exploit the extrapolation properties of the MultiRes networks on sub-regions of the body

    COVID-19 due to the B.1.617.2 (Delta) variant compared to B.1.1.7 (Alpha) variant of SARS-CoV-2:a prospective observational cohort study

    No full text
    The Delta (B.1.617.2) variant was the predominant UK circulating SARS-CoV-2 strain between May and December 2021. How Delta infection compares with previous variants is unknown. This prospective observational cohort study assessed symptomatic adults participating in the app-based COVID Symptom Study who tested positive for SARS-CoV-2 from May 26 to July 1, 2021 (Delta overwhelmingly the predominant circulating UK variant), compared (1:1, age- and sex-matched) with individuals presenting from December 28, 2020 to May 6, 2021 (Alpha (B.1.1.7) the predominant variant). We assessed illness (symptoms, duration, presentation to hospital) during Alpha- and Delta-predominant timeframes; and transmission, reinfection, and vaccine effectiveness during the Delta-predominant period. 3581 individuals (aged 18 to 100 years) from each timeframe were assessed. The seven most frequent symptoms were common to both variants. Within the first 28 days of illness, some symptoms were more common with Delta versus Alpha infection (including fever, sore throat, and headache) and some vice versa (dyspnoea). Symptom burden in the first week was higher with Delta versus Alpha infection; however, the odds of any given symptom lasting ≥ 7 days was either lower or unchanged. Illness duration ≥ 28 days was lower with Delta versus Alpha infection, though unchanged in unvaccinated individuals. Hospitalisation for COVID-19 was unchanged. The Delta variant appeared more (1.49) transmissible than Alpha. Re-infections were low in all UK regions. Vaccination markedly reduced the risk of Delta infection (by 69-84%). We conclude that COVID-19 from Delta or Alpha infections is similar. The Delta variant is more transmissible than Alpha; however, current vaccines showed good efficacy against disease. This research framework can be useful for future comparisons with new emerging variants

    Anosmia, ageusia, and other COVID-19-like symptoms in association with a positive SARS-CoV-2 test, across six national digital surveillance platforms:an observational study

    No full text
    BACKGROUND: Multiple voluntary surveillance platforms were developed across the world in response to the COVID-19 pandemic, providing a real-time understanding of population-based COVID-19 epidemiology. During this time, testing criteria broadened and health-care policies matured. We aimed to test whether there were consistent associations of symptoms with SARS-CoV-2 test status across three surveillance platforms in three countries (two platforms per country), during periods of testing and policy changes. METHODS: For this observational study, we used data of observations from three volunteer COVID-19 digital surveillance platforms (Carnegie Mellon University and University of Maryland Facebook COVID-19 Symptom Survey, ZOE COVID Symptom Study app, and the Corona Israel study) targeting communities in three countries (Israel, the UK, and the USA; two platforms per country). The study population included adult respondents (age 18–100 years at baseline) who were not health-care workers. We did logistic regression of self-reported symptoms on self-reported SARS-CoV-2 test status (positive or negative), adjusted for age and sex, in each of the study cohorts. We compared odds ratios (ORs) across platforms and countries, and we did meta-analyses assuming a random effects model. We also evaluated testing policy changes, COVID-19 incidence, and time scales of duration of symptoms and symptom-to-test time. FINDINGS: Between April 1 and July 31, 2020, 514 459 tests from over 10 million respondents were recorded in the six surveillance platform datasets. Anosmia–ageusia was the strongest, most consistent symptom associated with a positive COVID-19 test (robust aggregated rank one, meta-analysed random effects OR 16·96, 95% CI 13·13–21·92). Fever (rank two, 6·45, 4·25–9·81), shortness of breath (rank three, 4·69, 3·14–7·01), and cough (rank four, 4·29, 3·13–5·88) were also highly associated with test positivity. The association of symptoms with test status varied by duration of illness, timing of the test, and broader test criteria, as well as over time, by country, and by platform. INTERPRETATION: The strong association of anosmia–ageusia with self-reported positive SARS-CoV-2 test was consistently observed, supporting its validity as a reliable COVID-19 signal, regardless of the participatory surveillance platform, country, phase of illness, or testing policy. These findings show that associations between COVID-19 symptoms and test positivity ranked similarly in a wide range of scenarios. Anosmia, fever, and respiratory symptoms consistently had the strongest effect estimates and were the most appropriate empirical signals for symptom-based public health surveillance in areas with insufficient testing or benchmarking capacity. Collaborative syndromic surveillance could enhance real-time epidemiological investigations and public health utility globally. FUNDING: National Institutes of Health, National Institute for Health Research, Alzheimer's Society, Wellcome Trust, and Massachusetts Consortium on Pathogen Readiness

    Anxiety and depression symptoms after COVID-19 infection:results from the COVID Symptom Study app

    No full text
    BACKGROUND: Mental health issues have been reported after SARS-CoV-2 infection. However, comparison to prevalence in uninfected individuals and contribution from common risk factors (e.g., obesity, comorbidities) have not been examined. We identified how COVID-19 relates to mental health in the large community-based COVID Symptom Study. METHODS: We assessed anxiety and depression symptoms using two validated questionnaires in 413,148 individuals between February and April 2021; 26,998 had tested positive for SARS-CoV-2. We adjusted for physical and mental pre-pandemic comorbidities, BMI, age, and sex. FINDINGS: Overall, 26.4% of participants met screening criteria for general anxiety and depression. Anxiety and depression were slightly more prevalent in previously SARS-CoV-2 positive (30.4%) vs. negative (26.1%) individuals. This association was small compared to the effect of an unhealthy BMI and the presence of other comorbidities, and not evident in younger participants (≤40 years). Findings were robust to multiple sensitivity analyses. Association between SARS-CoV-2 infection and anxiety and depression was stronger in individuals with recent (120 days) infection, suggesting a short-term effect. INTERPRETATION: A small association was identified between SARS-CoV-2 infection and anxiety and depression symptoms. The proportion meeting criteria for self-reported anxiety and depression disorders is only slightly higher than pre-pandemic. FUNDING: Zoe Limited, National Institute for Health Research, Chronic Disease Research Foundation, National Institutes of Health, Medical Research Council U
    corecore