62 research outputs found
Evaluating machine learning techniques for predicting power spectra from reionization simulations
Upcoming experiments such as the SKA will provide huge quantities of data. Fast modelling of the high-redshift 21cm signal will be crucial for efficiently comparing these data sets with theory. The most detailed theoretical predictions currently come from numerical simulations and from faster but less accurate semi-numerical simulations. Recently, machine learning techniques have been proposed to emulate the behaviour of these semi-numerical simulations with drastically reduced time and computing cost. We compare the viability of five such machine learning techniques for emulating the 21cm power spectrum of the publicly-available code SimFast21. Our best emulator is a multilayer perceptron with three hidden layers, reproducing SimFast21 power spectra times faster than the simulation with 4% mean squared error averaged across all redshifts and input parameters. The other techniques (interpolation, Gaussian processes regression, and support vector machine) have slower prediction times and worse prediction accuracy than the multilayer perceptron. All our emulators can make predictions at any redshift and scale, which gives more flexible predictions but results in significantly worse prediction accuracy at lower redshifts. We then present a proof-of-concept technique for mapping between two different simulations, exploiting our best emulator's fast prediction speed. We demonstrate this technique to find a mapping between SimFast21 and another publicly-available code 21cmFAST. We observe a noticeable offset between the simulations for some regions of the input space. Such techniques could potentially be used as a bridge between fast semi-numerical simulations and accurate numerical radiative transfer simulations
A fast estimator for the bispectrum and beyond - a practical method for measuring non-Gaussianity in 21-cm maps
In this paper, we establish the accuracy and robustness of a fast estimator for the bispectrum – the ‘FFT-bispectrum estimator’. The implementation of the estimator presented here offers speed and simplicity benefits over a direct-measurement approach. We also generalize the derivation so it may be easily be applied to any order polyspectra, such as the trispectrum, with the cost of only a handful of Fast-Fourier Transforms (FFTs). All lower order statistics can also be calculated simultaneously for little extra cost. To test the estimator, we make use of a non-linear density field, and for a more strongly non-Gaussian test case, we use a toy-model of reionization in which ionized bubbles at a given redshift are all of equal size and are randomly distributed. Our tests find that the FFT-estimator remains accurate over a wide range of k, and so should be extremely useful for analysis of 21-cm observations. The speed of the FFT-bispectrum estimator makes it suitable for sampling applications, such as Bayesian inference. The algorithm we describe should prove valuable in the analysis of simulations and observations, and whilst, we apply it within the field of cosmology, this estimator is useful in any field that deals with non-Gaussian data
Casual Compressive Sensing for Gene Network Inference
We propose a novel framework for studying causal inference of gene
interactions using a combination of compressive sensing and Granger causality
techniques. The gist of the approach is to discover sparse linear dependencies
between time series of gene expressions via a Granger-type elimination method.
The method is tested on the Gardner dataset for the SOS network in E. coli, for
which both known and unknown causal relationships are discovered
Woodland Recovery after Suppression of Deer: Cascade effects for Small Mammals, Wood Mice (Apodemus sylvaticus) and Bank Voles (Myodes glareolus)
Over the past century, increases in both density and distribution of deer species in the Northern Hemisphere have resulted in major changes in ground flora and undergrowth vegetation of woodland habitats, and consequentially the animal communities that inhabit them. In this study, we tested whether recovery in the vegetative habitat of a woodland due to effective deer management (from a peak of 0.4–1.5 to <0.17 deer per ha) had translated to the small mammal community as an example of a higher order cascade effect. We compared deer-free exclosures with neighboring open woodland using capture-mark-recapture (CMR) methods to see if the significant difference in bank vole (Myodes glareolus) and wood mouse (Apodemus sylvaticus) numbers between these environments from 2001–2003 persisted in 2010. Using the multi-state Robust Design method in program MARK we found survival and abundance of both voles and mice to be equivalent between the open woodland and the experimental exclosures with no differences in various metrics of population structure (age structure, sex composition, reproductive activity) and individual fitness (weight), although the vole population showed variation both locally and temporally. This suggests that the vegetative habitat - having passed some threshold of complexity due to lowered deer density - has allowed recovery of the small mammal community, although patch dynamics associated with vegetation complexity still remain. We conclude that the response of small mammal communities to environmental disturbance such as intense browsing pressure can be rapidly reversed once the disturbing agent has been removed and the vegetative habitat is allowed to increase in density and complexity, although we encourage caution, as a source/sink dynamic may emerge between old growth patches and the recently disturbed habitat under harsh conditions
Evaluation of the effects of implementing an electronic early warning score system: protocol for a stepped wedge study
Background: An Early Warning Score is a clinical risk score based upon vital signs intended to aid recognition of patients in need of urgent medical attention. The use of an escalation of care policy based upon an Early Warning Score is mandated as the standard of practice in British hospitals. Electronic systems for recording vital sign observations and Early Warning Score calculation offer theoretical benefits over paper-based systems. However, the evidence for their clinical benefit is limited. Previous studies have shown inconsistent results. The majority have employed a “before and after” study design, which may be strongly confounded by simultaneously occurring events. This study aims to examine how the implementation of an electronic early warning score system, System for Notification and Documentation (SEND), affects the recognition of clinical deterioration occurring in hospitalised adult patients. Methods: This study is a non-randomised stepped wedge evaluation carried out across the four hospitals of the Oxford University Hospitals NHS Trust, comparing charting on paper and charting using SEND. We assume that more frequent monitoring of acutely ill patients is associated with better recognition of patient deterioration. The primary outcome measure is the time between a patient’s first observations set with an Early Warning Score above the alerting threshold and their subsequent set of observations. Secondary outcome measures are in-hospital mortality, cardiac arrest and Intensive Care admission rates, hospital length of stay and system usability measured using the System Usability Scale. We will also measure Intensive Care length of stay, Intensive Care mortality, Acute Physiology and Chronic Health Evaluation (APACHE) II acute physiology score on admission, to examine whether the introduction of SEND has any effect on Intensive Care-related outcomes. Discussion: The development of this protocol has been informed by guidance from the Agency for Healthcare Research and Quality (AHRQ) Health Information Technology Evaluation Toolkit and Delone and McLeans’s Model of Information System Success. Our chosen trial design, a stepped wedge study, is well suited to the study of a phased roll out. The choice of primary endpoint is challenging. We have selected the time from the first triggering observation set to the subsequent observation set. This has the benefit of being easy to measure on both paper and electronic charting and having a straightforward interpretation. We have collected qualitative measures of system quality via a user questionnaire and organisational descriptors to help readers understand the context in which SEND has been implemented
Magnetic resonance imaging of anterior cruciate ligament rupture
BACKGROUND: Magnetic resonance (MR) imaging is a useful diagnostic tool for the assessment of knee joint injury. Anterior cruciate ligament repair is a commonly performed orthopaedic procedure. This paper examines the concordance between MR imaging and arthroscopic findings. METHODS: Between February, 1996 and February, 1998, 48 patients who underwent magnetic resonance (MR) imaging of the knee were reported to have complete tears of the anterior cruciate ligament (ACL). Of the 48 patients, 36 were male, and 12 female. The average age was 27 years (range: 15 to 45). Operative reconstruction using a patellar bone-tendon-bone autograft was arranged for each patient, and an arthroscopic examination was performed to confirm the diagnosis immediately prior to reconstructive surgery. RESULTS: In 16 of the 48 patients, reconstructive surgery was cancelled when incomplete lesions were noted during arthroscopy, making reconstructive surgery unnecessary. The remaining 32 patients were found to have complete tears of the ACL, and therefore underwent reconstructive surgery. Using arthroscopy as an independent, reliable reference standard for ACL tear diagnosis, the reliability of MR imaging was evaluated. The true positive rate for complete ACL tear diagnosis with MR imaging was 67%, making the possibility of a false-positive report of "complete ACL tear" inevitable with MR imaging. CONCLUSIONS: Since conservative treatment is sufficient for incomplete ACL tears, the decision to undertake ACL reconstruction should not be based on MR findings alone
Conditional meta-analysis stratifying on detailed HLA genotypes identifies a novel type 1 diabetes locus around TCF19 in the MHC
The human leukocyte antigen (HLA) class II genes HLA-DRB1, -DQA1 and -DQB1 are the strongest genetic factors for type 1 diabetes (T1D). Additional loci in the major histocompatibility complex (MHC) are difficult to identify due to the region’s high gene density and complex linkage disequilibrium (LD). To facilitate the association analysis, two novel algorithms were implemented in this study: one for phasing the multi-allelic HLA genotypes in trio families, and one for partitioning the HLA strata in conditional testing. Screening and replication were performed on two large and independent datasets: the Wellcome Trust Case–Control Consortium (WTCCC) dataset of 2,000 cases and 1,504 controls, and the T1D Genetics Consortium (T1DGC) dataset of 2,300 nuclear families. After imputation, the two datasets have 1,941 common SNPs in the MHC, of which 22 were successfully tested and replicated based on the statistical testing stratifying on the detailed DRB1 and DQB1 genotypes. Further conditional tests using the combined dataset confirmed eight novel SNP associations around 31.3 Mb on chromosome 6 (rs3094663, p = 1.66 × 10−11 and rs2523619, p = 2.77 × 10−10 conditional on the DR/DQ genotypes). A subsequent LD analysis established TCF19, POU5F1, CCHCR1 and PSORS1C1 as potential causal genes for the observed association
Temperature Control of Fimbriation Circuit Switch in Uropathogenic Escherichia coli: Quantitative Analysis via Automated Model Abstraction
Uropathogenic Escherichia coli (UPEC) represent the predominant cause of urinary tract infections (UTIs). A key UPEC molecular virulence mechanism is type 1 fimbriae, whose expression is controlled by the orientation of an invertible chromosomal DNA element—the fim switch. Temperature has been shown to act as a major regulator of fim switching behavior and is overall an important indicator as well as functional feature of many urologic diseases, including UPEC host-pathogen interaction dynamics. Given this panoptic physiological role of temperature during UTI progression and notable empirical challenges to its direct in vivo studies, in silico modeling of corresponding biochemical and biophysical mechanisms essential to UPEC pathogenicity may significantly aid our understanding of the underlying disease processes. However, rigorous computational analysis of biological systems, such as fim switch temperature control circuit, has hereto presented a notoriously demanding problem due to both the substantial complexity of the gene regulatory networks involved as well as their often characteristically discrete and stochastic dynamics. To address these issues, we have developed an approach that enables automated multiscale abstraction of biological system descriptions based on reaction kinetics. Implemented as a computational tool, this method has allowed us to efficiently analyze the modular organization and behavior of the E. coli fimbriation switch circuit at different temperature settings, thus facilitating new insights into this mode of UPEC molecular virulence regulation. In particular, our results suggest that, with respect to its role in shutting down fimbriae expression, the primary function of FimB recombinase may be to effect a controlled down-regulation (rather than increase) of the ON-to-OFF fim switching rate via temperature-dependent suppression of competing dynamics mediated by recombinase FimE. Our computational analysis further implies that this down-regulation mechanism could be particularly significant inside the host environment, thus potentially contributing further understanding toward the development of novel therapeutic approaches to UPEC-caused UTIs
Stochastic flowering phenology in Dactylis Glomerata populations described by Markov chain modelling
Understanding the relationship between flowering patterns and pollen dispersal is important in climate change modelling, pollen forecasting, forestry and agriculture. Enhanced understanding of this connection can be gained through detailed spatial and temporal flowering observations on a population level, combined with modelling simulating the dynamics. Species with large distribution ranges, long flowering seasons, high pollen production and naturally large populations can be used to illustrate these dynamics. Revealing and simulating species-specific demographic and stochastic elements in the flowering process will likely be important in determining when pollen release is likely to happen in flowering plants. Spatial and temporal dynamics of eight populations of Dactylis glomerata were collected over the course of two years to determine high-resolution demographic elements. Stochastic elements were accounted for using Markov Chain approaches in order to evaluate tiller-specific contribution to overall population dynamics. Tiller-specific developmental dynamics were evaluated using three different RV matrix correlation coefficients. We found that the demographic patterns in population development were the same for all populations with key phenological events differing only by a few days over the course of the seasons. Many tillers transitioned very quickly from non-flowering to full flowering, a process that can be replicated with Markov Chain modelling. Our novel approach demonstrates the identification and quantification of stochastic elements in the flowering process of D. glomerata, an element likely to be found in many flowering plants. The stochastic modelling approach can be used to develop detailed pollen release models for Dactylis, other grass species and probably other flowering plants
- …