15 research outputs found

    Gibbs optimal design of experiments

    Full text link
    Bayesian optimal design of experiments is a well-established approach to planning experiments. Briefly, a probability distribution, known as a statistical model, for the responses is assumed which is dependent on a vector of unknown parameters. A utility function is then specified which gives the gain in information for estimating the true value of the parameters using the Bayesian posterior distribution. A Bayesian optimal design is given by maximising the expectation of the utility with respect to the joint distribution given by the statistical model and prior distribution for the true parameter values. The approach takes account of the experimental aim via specification of the utility and of all assumed sources of uncertainty via the expected utility. However, it is predicated on the specification of the statistical model. Recently, a new type of statistical inference, known as Gibbs (or General Bayesian) inference, has been advanced. This is Bayesian-like, in that uncertainty on unknown quantities is represented by a posterior distribution, but does not necessarily rely on specification of a statistical model. Thus the resulting inference should be less sensitive to misspecification of the statistical model. The purpose of this paper is to propose Gibbs optimal design: a framework for optimal design of experiments for Gibbs inference. The concept behind the framework is introduced along with a computational approach to find Gibbs optimal designs in practice. The framework is demonstrated on exemplars including linear models, and experiments with count and time-to-event responses

    Using virtual reality and thermal imagery to improve statistical modelling of vulnerable and protected species.

    Get PDF
    Biodiversity loss and sparse observational data mean that critical conservation decisions may be based on little to no information. Emerging technologies, such as airborne thermal imaging and virtual reality, may facilitate species monitoring and improve predictions of species distribution. Here we combined these two technologies to predict the distribution of koalas, specialized arboreal foliovores facing population declines in many parts of eastern Australia. For a study area in southeast Australia, we complemented ground-survey records with presence and absence observations from thermal-imagery obtained using Remotely-Piloted Aircraft Systems. These field observations were further complemented with information elicited from koala experts, who were immersed in 360-degree images of the study area. The experts were asked to state the probability of habitat suitability and koala presence at the sites they viewed and to assign each probability a confidence rating. We fit logistic regression models to the ground survey data and the ground plus thermal-imagery survey data and a Beta regression model to the expert elicitation data. We then combined parameter estimates from the expert-elicitation model with those from each of the survey models to predict koala presence and absence in the study area. The model that combined the ground, thermal-imagery and expert-elicitation data substantially reduced the uncertainty around parameter estimates and increased the accuracy of classifications (koala presence vs absence), relative to the model based on ground-survey data alone. Our findings suggest that data elicited from experts using virtual reality technology can be combined with data from other emerging technologies, such as airborne thermal-imagery, using traditional statistical models, to increase the information available for species distribution modelling and the conservation of vulnerable and protected species

    Multiorgan MRI findings after hospitalisation with COVID-19 in the UK (C-MORE): a prospective, multicentre, observational cohort study

    Get PDF
    Introduction: The multiorgan impact of moderate to severe coronavirus infections in the post-acute phase is still poorly understood. We aimed to evaluate the excess burden of multiorgan abnormalities after hospitalisation with COVID-19, evaluate their determinants, and explore associations with patient-related outcome measures. Methods: In a prospective, UK-wide, multicentre MRI follow-up study (C-MORE), adults (aged ≥18 years) discharged from hospital following COVID-19 who were included in Tier 2 of the Post-hospitalisation COVID-19 study (PHOSP-COVID) and contemporary controls with no evidence of previous COVID-19 (SARS-CoV-2 nucleocapsid antibody negative) underwent multiorgan MRI (lungs, heart, brain, liver, and kidneys) with quantitative and qualitative assessment of images and clinical adjudication when relevant. Individuals with end-stage renal failure or contraindications to MRI were excluded. Participants also underwent detailed recording of symptoms, and physiological and biochemical tests. The primary outcome was the excess burden of multiorgan abnormalities (two or more organs) relative to controls, with further adjustments for potential confounders. The C-MORE study is ongoing and is registered with ClinicalTrials.gov, NCT04510025. Findings: Of 2710 participants in Tier 2 of PHOSP-COVID, 531 were recruited across 13 UK-wide C-MORE sites. After exclusions, 259 C-MORE patients (mean age 57 years [SD 12]; 158 [61%] male and 101 [39%] female) who were discharged from hospital with PCR-confirmed or clinically diagnosed COVID-19 between March 1, 2020, and Nov 1, 2021, and 52 non-COVID-19 controls from the community (mean age 49 years [SD 14]; 30 [58%] male and 22 [42%] female) were included in the analysis. Patients were assessed at a median of 5·0 months (IQR 4·2–6·3) after hospital discharge. Compared with non-COVID-19 controls, patients were older, living with more obesity, and had more comorbidities. Multiorgan abnormalities on MRI were more frequent in patients than in controls (157 [61%] of 259 vs 14 [27%] of 52; p<0·0001) and independently associated with COVID-19 status (odds ratio [OR] 2·9 [95% CI 1·5–5·8]; padjusted=0·0023) after adjusting for relevant confounders. Compared with controls, patients were more likely to have MRI evidence of lung abnormalities (p=0·0001; parenchymal abnormalities), brain abnormalities (p<0·0001; more white matter hyperintensities and regional brain volume reduction), and kidney abnormalities (p=0·014; lower medullary T1 and loss of corticomedullary differentiation), whereas cardiac and liver MRI abnormalities were similar between patients and controls. Patients with multiorgan abnormalities were older (difference in mean age 7 years [95% CI 4–10]; mean age of 59·8 years [SD 11·7] with multiorgan abnormalities vs mean age of 52·8 years [11·9] without multiorgan abnormalities; p<0·0001), more likely to have three or more comorbidities (OR 2·47 [1·32–4·82]; padjusted=0·0059), and more likely to have a more severe acute infection (acute CRP >5mg/L, OR 3·55 [1·23–11·88]; padjusted=0·025) than those without multiorgan abnormalities. Presence of lung MRI abnormalities was associated with a two-fold higher risk of chest tightness, and multiorgan MRI abnormalities were associated with severe and very severe persistent physical and mental health impairment (PHOSP-COVID symptom clusters) after hospitalisation. Interpretation: After hospitalisation for COVID-19, people are at risk of multiorgan abnormalities in the medium term. Our findings emphasise the need for proactive multidisciplinary care pathways, with the potential for imaging to guide surveillance frequency and therapeutic stratification

    Extending decision tree methods for the analysis of remotely sensed images

    No full text
    One UN Sustainable Development Goal focuses on monitoring the presence, growth, and loss of forests. The cost of tracking progress towards this goal is often prohibitive. Satellite images provide an opportunity to use free data for environmental monitoring. However, these images have missing data due to cloud cover, particularly in the tropics. In this thesis I introduce fast and accurate new statistical methods to fill these data gaps. I create spatial and stochastic extensions of decision tree machine learning methods for interpolating missing data. I illustrate these methods with case studies monitoring forest cover in Australia and South America

    Statistical Machine Learning Methods and Remote Sensing for Sustainable Development Goals: A Review

    No full text
    Interest in statistical analysis of remote sensing data to produce measurements of environment, agriculture, and sustainable development is established and continues to increase, and this is leading to a growing interaction between the earth science and statistical domains. With this in mind, we reviewed the literature on statistical machine learning methods commonly applied to remote sensing data. We focus particularly on applications related to the United Nations World Bank Sustainable Development Goals, including agriculture (food security), forests (life on land), and water (water quality). We provide a review of useful statistical machine learning methods, how they work in a remote sensing context, and examples of their application to these types of data in the literature. Rather than prescribing particular methods for specific applications, we provide guidance, examples, and case studies from the literature for the remote sensing practitioner and applied statistician. In the supplementary material, we also describe the necessary steps pre and post analysis for remote sensing data; the pre-processing and evaluation steps

    Spatial and machine learning methods of satellite imagery analysis for Sustainable Development Goals

    No full text
    The United Nations (UN) and World Bank have set Sustainable Development Goals (SDGs), with the aim for countries to reach targets related to important aspects of quality of life by 2030. An essential element of sustainable development is achieving social and economic aims to improve human quality of life, while conserving and managing natural resources. Earth observation data, such as satellite imagery data, are increasingly being used for monitoring the SDGs, and statistical machine learning methods are commonly used to analyse these types of data. However, current methods often exclude the spatial information inherent in earth observation data, which can provide useful insights. In this paper we review how spatial information is currently measured for remote sensing data, describe spatial machine learning methods in the literature and opportunities for further development of spatial methods. We also describe a minimum set of requirements to measure SDGs from satellite imagery data

    Spatial Random Forest (S-RF):A random forest approach for spatially interpolating missing land-cover data with multiple classes

    No full text
    Land-cover maps are important tools for monitoring large-scale environmental change and can be regularly updated using free satellite imagery data. A key challenge with constructing these maps is missing data in the satellite images on which they are based. To address this challenge, we created a Spatial Random Forest (S-RF) model that can accurately interpolate missing data in satellite images based on a modest training set of observed data in the image of interest. We demonstrate that this approach can be effective with only a minimal number of spatial covariates, namely latitude and longitude. The motivation for only using latitude and longitude in our model is that these covariates are available for all images whether the data are observed or missing due to cloud cover. The S-RF model can flexibly partition these covariates to provide accurate estimates, with easy incorporation of additional covariates to improve estimation if available. The effectiveness of our approach has been previously demonstrated for prediction of two land-cover classes in an Australian case study. In this paper, we extend the method to more than two classes. We demonstrate the performance of the S-RF method at interpolating multiple land-cover classes, using a case study drawn from South America. The results show that the method is best at predicting three land-cover classes, compared with 5 or 10 classes, and that other information is needed to improve performance as the number of classes grows, particularly if the classes are unbalanced. We explore two issues through a sensitivity analysis: the influence of the amount of missing data in the image and the influence of the amount of training data for model development and performance. The results show that the amount of missing data due to cloud cover is influential on model performance for multiple classes. We also found that increasing the amount of training data beyond 100,000 observations had minimal impact on model accuracy. Hence, a relatively small amount of observed data is required for training the model, which is beneficial for computation time

    A decision tree approach for spatially interpolating missing land cover data and classifying satellite images

    No full text
    Sustainable Development Goals (SDGs) are a set of priorities the United Nations and World Bank have set for countries to reach in order to improve quality of life and environment globally by 2030. Free satellite images have been identified as a key resource that can be used to produce official statistics and analysis to measure progress towards SDGs, especially those that are concerned with the physical environment, such as forest, water, and crops. Satellite images can often be unusable due to missing data from cloud cover, particularly in tropical areas where the deforestation rates are high. There are existing methods for filling in image gaps; however, these are often computationally expensive in image classification or not effective at pixel scale. To address this, we use two machine learning methods—gradient boosted machine and random forest algorithms—to classify the observed and simulated ‘missing’ pixels in satellite images as either grassland or woodland. We also predict a continuous biophysical variable, Foliage Projective Cover (FPC), which was derived from satellite images, and perform accurate binary classification and prediction using only the latitude and longitude of the pixels. We compare the performance of these methods against each other and inverse distance weighted interpolation, which is a well-established spatial interpolation method. We find both of the machine learning methods, particularly random forest, perform fast and accurate classifications of both observed and missing pixels, with up to 0.90 accuracy for the binary classification of pixels as grassland or woodland. The results show that the random forest method is more accurate than inverse distance weighted interpolation and gradient boosted machine for prediction of FPC for observed and missing data. Based on the case study results from a sub-tropical site in Australia, we show that our approach provides an efficient alternative for interpolating images and performing land cover classifications

    Stochastic spatial random forest (SS-RF) for interpolating probabilities of missing land cover data

    Get PDF
    Forests are a global environmental priority that need to be monitored frequently and at large scales. Satellite images are a proven useful, free data source for regular global forest monitoring but these images often have missing data in tropical regions due to climate driven persistent cloud cover. Remote sensing and statistical approaches to filling these missing data gaps exist and these can be highly accurate, but any interpolation method results are uncertain and these methods do not provide measures of this uncertainty. We present a new two-step spatial stochastic random forest (SS-RF) method that uses random forest algorithms to construct Beta distributions for interpolating missing data. This method has comparable performance with the traditional remote sensing compositing method, and additionally provides a probability for each interpolated data point. Our results show that the SS-RF method can accurately interpolate missing data and quantify uncertainty and its applicability to the challenge of monitoring forest using free and incomplete satellite imagery data. We propose that there is scope for our SS-RF method to be applied to other big data problems where a measurement of uncertainty is needed in addition to estimates.</p

    Interpolating missing land cover data using stochastic spatial random forests for improved change detection

    No full text
    Forest cover requires large scale and frequent monitoring as an indicator of biodiversity and progress towards United Nations and World Bank Sustainable Development Goal 15. Measuring change in forest cover over time is an essential task in order to track and preserve quality habitats for species around the world. Due to the prohibitive expense and impracticality of mass field data collection to monitor forest cover at regular intervals, satellite images are a key data source for monitoring forest cover globally. A challenge of working with satellite images is missing data due to clouds. Existing methods for interpolating the missing data based on past images, such as compositing, are effective for stable land cover but can be inaccurate for dynamic and substantially changing landscapes. Here we present an adaptation of our recent stochastic spatial random forest (SS-RF) method, which combines observed data from a prior image and modelled estimates of the current image to produce interpolated land cover values and associated probabilities of those values. Results show our SS-RF method accurately detected simulated land cover change under both clear felling (0.83 average overall accuracy) and tree thinning (0.85 average overall accuracy). Our method detected forest cover change substantially more accurately than compositing, offering 39% and 12% increases in average overall accuracy for clear felling and tree thinning simulations respectively. However, when natural fluctuation occurs and there is minimal change in land cover, compositing has equivalent or more accurate performance than our method. Overall we find that our SS-RF method produces accurate estimates under a range of simulated forest clearing scenarios and has a more accurate and robust performance than compositing when modelling noticeably changing landscapes
    corecore