14 research outputs found

    A statistical and machine learning approach to the study of astrochemistry

    Get PDF
    This thesis uses a variety of statistical and machine learning techniques to provide new insight into astrochemical processes. Astrochemistry is the study of chemistry in the universe. Due to the highly non-linear nature of a variety of competing factors, it is often difficult to understand the impact of any individual parameter on the abundance of molecules of interest. It is for this reason we present a number of techniques that provide insight. Chapter 2 is a chemical modelling study that considers the sensitivity of a glycine chemical network to the addition of two H2 addition reactions across a number of physical environments. This work considers the concept of a ``hydrogen economy" within the context of chemical reaction networks and demonstrates that H2 decreases the abundance of glycine, one of the simplest amino acids, as well as its precursors. Chapter 3 considers a methodology that involves utilising the topology of a chemical network in order to accelerate the Bayesian inference problem by reducing the dimensionality of the parameters to be inferred at once. We demonstrate that a network can be simplified as well as split into smaller pieces for the inference problem by using a toy network. Chapter 4 considers how the dimensionality can be simplified by exploiting the physics of the underlying chemical reaction mechanisms. We do this by realising that the most pertinent reaction rate parameter is the binding energy of the more mobile species. This significantly reduces the dimensionality of the problem we have to solve. Chapter 5 builds on the work done in Chapters 3 and 4. The MOPED algorithm is utilised to identify which species should be prioritised for detection in order to reduce the variance of our binding energy posterior distributions. Chapter 6 introduces the use of machine learning interpretability to provide better insights into the relationships between the physical input parameters of a chemical code and the final abundances of various species. By identifying the relative importance of various parameters and quantifying this, we make qualitative comparisons to observations and demonstrate good agreement. Chapter 7 uses the same methods as in Chapters 4, 5 and 6 in light of new JWST observations. The relationship between binding energies and the abundances of species is also explored using machine learning interpretability techniques

    Understanding Molecular Abundances in Star-Forming Regions Using Interpretable Machine Learning

    Full text link
    Astrochemical modelling of the interstellar medium typically makes use of complex computational codes with parameters whose values can be varied. It is not always clear what the exact nature of the relationship is between these input parameters and the output molecular abundances. In this work, a feature importance analysis is conducted using SHapley Additive exPlanations (SHAP), an interpretable machine learning technique, to identify the most important physical parameters as well as their relationship with each output. The outputs are the abundances of species and ratios of abundances. In order to reduce the time taken for this process, a neural network emulator is trained to model each species' output abundance and this emulator is used to perform the interpretable machine learning. SHAP is then used to further explore the relationship between the physical features and the abundances for the various species and ratios we considered. \ce{H2O} and CO's gas phase abundances are found to strongly depend on the metallicity. \ce{NH3} has a strong temperature dependence, with there being two temperature regimes ( 100K). By analysing the chemical network, we relate this to the chemical reactions in our network and find the increased temperature results in increased efficiency of destruction pathways. We investigate the HCN/HNC ratio and show that it can be used as a cosmic thermometer, agreeing with the literature. This ratio is also found to be correlated with the metallicity. The HCN/CS ratio serves as a density tracer, but also has three separate temperature-dependence regimes, which are linked to the chemistry of the two molecules.Comment: Accepted for publication in Monthly Notices of the Royal Astronomical Society. 20 pages, 20 figures and 5 table

    A statistical and machine learning approach to the study of astrochemistry

    Full text link
    In order to obtain a good understanding of astrochemistry, it is crucial to better understand the key parameters that govern grain-surface chemistry. For many chemical networks, these crucial parameters are the binding energies of the species. However, there exists much disagreement regarding these values in the literature. In this work, a Bayesian inference approach is taken to estimate these values. It is found that this is difficult to do in the absence of enough data. The Massive Optimised Parameter Estimation and Data (MOPED) compression algorithm is then used to help determine which species should be prioritised for future detections in order to better constrain the values of binding energies. Finally, an interpretable machine learning approach is taken in order to better understand the non-linear relationship between binding energies and the final abundances of specific species of interest.Comment: Accepted for publication in Faraday Discussions 2023. 14 pages, 7 figures and 1 tabl

    Investigating the impact of reactions of C and CH with molecular hydrogen on a glycine gas-grain network

    Get PDF
    The impact of including the reactions of C and CH with molecular hydrogen in a gas-grain network is assessed via a sensitivity analysis. To this end, we vary three parameters, namely, the efficiency for the reaction C + H2 −→ CH2, and the cosmic ray ionization rate, with the third parameter being the final density of the collapsing dark cloud. A grid of 12 models is run to investigate the effect of all parameters on the final molecular abundances of the chemical network. We find that including reactions with molecular hydrogen alters the hydrogen economy of the network; since some species are hydrogenated by molecular hydrogen, atomic hydrogen is freed up. The abundances of simple molecules produced from hydrogenation, such as CH4, CH3OH, and NH3, increase, and at the same time, more complex species such as glycine and its precursors see a significant decrease in their final abundances. We find that the precursors of glycine are being preferentially hydrogenated, and therefore glycine itself is produced less efficiently

    Exploiting Network Topology for Accelerated Bayesian Inference of Grain Surface Reaction Networks

    Get PDF
    In the study of grain-surface chemistry in the interstellar medium, there exists much uncertainty regarding the reaction mechanisms with few constraints on the abundances of grain-surface molecules. Bayesian inference can be performed to determine the likely reaction rates. In this work, we consider methods for reducing the computational expense of performing Bayesian inference on a reaction network by looking at the geometry of the network. Two methods of exploiting the topology of the reaction network are presented. One involves reducing a reaction network to just the reaction chains with constraints on them. After this, new constraints are added to the reaction network and it is shown that one can separate this new reaction network into sub-networks. The fact that networks can be separated into sub-networks is particularly important for the reaction networks of interstellar complex organic molecules, whose surface reaction networks may have hundreds of reactions. Both methods allow the maximum-posterior reaction rate to be recovered with minimal bias

    Probing Crystallinity and Grain Structure of 2D Materials and 2D-Like Van der Waals Heterostructures by Low-Voltage Electron Diffraction

    Get PDF
    4D scanning transmission electron microscopy (4D-STEM) is a powerful method for characterizing electron-transparent samples with down to sub-Ã…ngstrom spatial resolution. 4D-STEM can reveal local crystallinity, orientation, grain size, strain, and many more sample properties by rastering a convergent electron beam over a sample area and acquiring a transmission diffraction pattern (DP) at each scan position. These patterns are rich in information about the atomic structure of the probed volume, making this technique a potent tool to characterize even inhomogeneous samples. 4D-STEM can also be used in scanning electron microscopes (SEMs) by placing an electron-sensitive camera below the sample. 4D-STEM-in-SEMs is ideally suited to characterize 2D materials and 2D-like van der Waals heterostructures (vdWH) due to their inherent thickness of a few nanometers. The lower accelerating voltage of SEMs leads to strong scattering even from monolayers. The large field of view and down to sub-nm spatial resolution of SEMs are ideal to map properties of the different constituents of 2D-like vdWH by probing their combined sample volume. A unique 4D-STEM-in-SEM system is applied to reveal the single crystallinity of MoS2 exfoliated with gold-mediation as well as the crystal orientation and coverage of both components of a C60/MoS2 vdWH are determined

    Data consistency in the English Hospital Episodes Statistics database

    Get PDF
    BACKGROUND: To gain maximum insight from large administrative healthcare datasets it is important to understand their data quality. Although a gold standard against which to assess criterion validity rarely exists for such datasets, internal consistency can be evaluated. We aimed to identify inconsistencies in the recording of mandatory International Statistical Classification of Diseases and Related Health Problems, tenth revision (ICD-10) codes within the Hospital Episodes Statistics dataset in England. METHODS: Three exemplar medical conditions where recording is mandatory once diagnosed were chosen: autism, type II diabetes mellitus and Parkinson's disease dementia. We identified the first occurrence of the condition ICD-10 code for a patient during the period April 2013 to March 2021 and in subsequent hospital spells. We designed and trained random forest classifiers to identify variables strongly associated with recording inconsistencies. RESULTS: For autism, diabetes and Parkinson's disease dementia respectively, 43.7%, 8.6% and 31.2% of subsequent spells had inconsistencies. Coding inconsistencies were highly correlated with non-coding of an underlying condition, a change in hospital trust and greater time between the spell with the first coded diagnosis and the subsequent spell. For patients with diabetes or Parkinson's disease dementia, the code recording for spells without an overnight stay were found to have a higher rate of inconsistencies. CONCLUSIONS: Data inconsistencies are relatively common for the three conditions considered. Where these mandatory diagnoses are not recorded in administrative datasets, and where clinical decisions are made based on such data, there is potential for this to impact patient care

    Data quality and autism : issues and potential impacts

    Get PDF
    Introduction Large healthcare datasets can provide insight that has the potential to improve outcomes for patients. However, it is important to understand the strengths and limitations of such datasets so that the insights they provide are accurate and useful. The aim of this study was to identify data inconsistencies within the Hospital Episodes Statistics (HES) dataset for autistic patients and assess potential biases introduced through these inconsistencies and their impact on patient outcomes. The study can only identify inconsistencies in recording of autism diagnosis and not whether the inclusion or exclusion of the autism diagnosis is the error. Methods Data were extracted from the HES database for the period 1st April 2013 to 31st March 2021 for patients with a diagnosis of autism. First spells in hospital during the study period were identified for each patient and these were linked to any subsequent spell in hospital for the same patient. Data inconsistencies were recorded where autism was not recorded as a diagnosis in a subsequent spell. Features associated with data inconsistencies were identified using a random forest classifiers and regression modelling. Results Data were available for 172,324 unique patients who had been recorded as having an autism diagnosis on first admission. In total, 43.7 % of subsequent spells were found to have inconsistencies. The features most strongly associated with inconsistencies included greater age, greater deprivation, longer time since the first spell, change in provider, shorter length of stay, being female and a change in the main specialty description. The random forest algorithm had an area under the receiver operating characteristic curve of 0.864 (95 % CI [0.862 – 0.866]) in predicting a data inconsistency. For patients who died in hospital, inconsistencies in their final spell were significantly associated with being 80 years and over, being female, greater deprivation and use of a palliative care code in the death spell. Conclusions Data inconsistencies in the HES database were relatively common in autistic patients and were associated a number of patient and hospital admission characteristics. Such inconsistencies have the potential to distort our understanding of service use in key demographic groups

    COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records

    Get PDF
    BACKGROUND: Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework. METHODS: In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status. FINDINGS: Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1. INTERPRETATION: Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources. FUNDING: British Heart Foundation Data Science Centre, led by Health Data Research UK

    Skoolonderwys en politiek in die lig van die Suid-Afrikaanse situasie : 'n studie in die fundamentele opvoedkunde

    No full text
    Proefskrif (M. Ed) -- Universiteit van Stellenbosch, 1988.Full text to be digitised and attached to bibliographic record
    corecore