14 research outputs found

    Predictive geochemical mapping using machine learning in western Kenya

    Get PDF
    Digital soil mapping is a cost-effective method for obtaining detailed information regarding the spatial distribution of chemical elements in soils. Machine learning (ML) algorithms such as random forest (RF) models have been developed for such tasks as they are capable of modelling non-linear relationships using a range of datasets and determining the importance of predictor variables, offering multiple benefits to traditional techniques such as kriging. In this study, we describe a framework for spatial prediction based on RF modelling where inverse distance weighted (IDW) predictors are used in conjunction with auxiliary environmental covariates. The model was applied to predict the total concentration (mg kg-1 ) of 56 elements, soil pH and organic matter content, as well as to assess prediction uncertainty using 466 soil samples in western Kenya (Watts et al 2021). The results of iodine (I), selenium (Se), zinc (Zn) and soil pH are highlighted in this work due to their contrasting biogeochemical cycles and widespread dietary deficiencies in sub-Saharan Africa, whilst soil pH was assessed as an important parameter to define soil chemical reactions. Algorithm performance was evaluated to determine the importance of each predictor variable and the model’s response using partial dependence profiles. The accuracy and precision of each RF model were assessed by evaluating the out-of-bag predicted values. The IDW predictor variables had the greatest impact on assessing the distribution of soil properties in the study area, however, the inclusion of auxiliary values did improve model performance for all soil properties. The results presented in this paper highlight the benefits of ML algorithms which can incorporate multiple layers of data for spatial prediction, uncertainty assessment and attributing variable importance. Additional research is now required to ensure health practitioners and the agricommunity utilise the geochemical maps presented here, and the webtool, for assessing the relationship between environmental geochemistry and endemic diseases and preventable micronutrient deficiency

    Predictive geochemical mapping using machine learning in western Kenya

    Get PDF
    Digital soil mapping techniques represent a cost-effective method for obtaining detailed information regarding the spatial distribution of chemical elements in soils. Machine learning (ML) algorithms using random forest (RF) models have been developed for classification, pattern recognition and regression tasks, they are capable of modelling non-linear relationships using a range of datasets, identifying hierarchical relationships, and determining the importance of predictor variables. In this study, we describe a framework for spatial prediction based on RF modelling where inverse distance weighted (IDW) predictors are used in conjunction with ancillary environmental covariates. The model was applied to predict the total concentration (mg kg−1) and assess the prediction uncertainty of 56 elements, soil pH and organic matter content using 466 soil samples in western Kenya; the results of iodine (I), selenium (Se), zinc (Zn) and soil pH are highlighted in this work. These elements were selected due to contrasting biogeochemical cycles and widespread dietary deficiencies in sub-Saharan Africa, whilst soil pH is an important parameter controlling soil chemical reactions. Algorithm performance was evaluated determining the relative importance of each predictor variable and the model's response using partial dependence profiles. The accuracy and precision of each RF model were assessed by evaluating out-of-bag predicted values. The models R2 values range from 0.31 to 0.64 whilst CCC values range from 0.51 to 0.77. The IDW predictor variables had the greatest impact on assessing the distribution of soil properties in the study area, however, the inclusion of ancillary environmental data improved model performance for all soil properties. The results presented in this paper highlight the benefits of ML algorithms which can incorporate multiple layers of data for spatial prediction, uncertainty assessment and attributing variable importance. Additional research is now required to ensure health practitioners and the agri-community utilise the geochemical maps presented here for assessing the relationship between environmental geochemistry, endemic diseases and preventable micronutrient deficiency

    The evolution of non-small cell lung cancer metastases in TRACERx

    Get PDF
    Metastatic disease is responsible for the majority of cancer-related deaths1. We report the longitudinal evolutionary analysis of 126 non-small cell lung cancer (NSCLC) tumours from 421 prospectively recruited patients in TRACERx who developed metastatic disease, compared with a control cohort of 144 non-metastatic tumours. In 25% of cases, metastases diverged early, before the last clonal sweep in the primary tumour, and early divergence was enriched for patients who were smokers at the time of initial diagnosis. Simulations suggested that early metastatic divergence more frequently occurred at smaller tumour diameters (less than 8 mm). Single-region primary tumour sampling resulted in 83% of late divergence cases being misclassified as early, highlighting the importance of extensive primary tumour sampling. Polyclonal dissemination, which was associated with extrathoracic disease recurrence, was found in 32% of cases. Primary lymph node disease contributed to metastatic relapse in less than 20% of cases, representing a hallmark of metastatic potential rather than a route to subsequent recurrences/disease progression. Metastasis-seeding subclones exhibited subclonal expansions within primary tumours, probably reflecting positive selection. Our findings highlight the importance of selection in metastatic clone evolution within untreated primary tumours, the distinction between monoclonal versus polyclonal seeding in dictating site of recurrence, the limitations of current radiological screening approaches for early diverging tumours and the need to develop strategies to target metastasis-seeding subclones before relaps

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data

    The evolution of non-small cell lung cancer metastases in TRACERx

    Get PDF
    Metastatic disease is responsible for the majority of cancer-related deaths. We report the longitudinal evolutionary analysis of 126 non-small cell lung cancer (NSCLC) tumours from 421 prospectively recruited patients in TRACERx who developed metastatic disease, compared with a control cohort of 144 non-metastatic tumours. In 25% of cases, metastases diverged early, before the last clonal sweep in the primary tumour, and early divergence was enriched for patients who were smokers at the time of initial diagnosis. Simulations suggested that early metastatic divergence more frequently occurred at smaller tumour diameters (less than 8 mm). Single-region primary tumour sampling resulted in 83% of late divergence cases being misclassified as early, highlighting the importance of extensive primary tumour sampling. Polyclonal dissemination, which was associated with extrathoracic disease recurrence, was found in 32% of cases. Primary lymph node disease contributed to metastatic relapse in less than 20% of cases, representing a hallmark of metastatic potential rather than a route to subsequent recurrences/disease progression. Metastasis-seeding subclones exhibited subclonal expansions within primary tumours, probably reflecting positive selection. Our findings highlight the importance of selection in metastatic clone evolution within untreated primary tumours, the distinction between monoclonal versus polyclonal seeding in dictating site of recurrence, the limitations of current radiological screening approaches for early diverging tumours and the need to develop strategies to target metastasis-seeding subclones before relapse
    corecore