127 research outputs found

    Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP

    Get PDF
    In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the "black box" paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.This study was funded by Fundación Mundo Sano and Instituto de Salud Carlos III. The funders had no roles in the design of the study or collection, analysis and interpretation of the data. C.M.S. and M.N.C. had a PhD scholarship from Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET).S

    Global mapping of infectious disease

    Get PDF
    The primary aim of this review was to evaluate the state of knowledge of the geographical distribution of all infectious diseases of clinical significance to humans. A systematic review was conducted to enumerate cartographic progress, with respect to the data available for mapping and the methods currently applied. The results helped define the minimum information requirements for mapping infectious disease occurrence, and a quantitative framework for assessing the mapping opportunities for all infectious diseases. This revealed that of 355 infectious diseases identified, 174 (49%) have a strong rationale for mapping and of these only 7 (4%) had been comprehensively mapped. A variety of ambitions, such as the quantification of the global burden of infectious disease, international biosurveillance, assessing the likelihood of infectious disease outbreaks and exploring the propensity for infectious disease evolution and emergence, are limited by these omissions. An overview of the factors hindering progress in disease cartography is provided. It is argued that rapid improvement in the landscape of infectious diseases mapping can be made by embracing non-conventional data sources, automation of geo-positioning and mapping procedures enabled by machine learning and information technology, respectively, in addition to harnessing labour of the volunteer ‘cognitive surplus’ through crowdsourcing

    Spatial epidemiological approaches to monitor and measure the risk of human leptospirosis

    Get PDF

    Applications of generative probabilistic models for information recovery in 1H NMR metabolomics

    Get PDF
    Metabolomics is a well-established approach for investigation of the metabolic state of an organism usually conducted via high-throughput methods and focusing on quantification and identification of small molecules. A popular analytical technique used in metabolomics is 1H NMR spectroscopy. The data obtained in NMR experiments contains a wealth of information on metabolites in a sample and their chemical structure. To help uncover this information and find patterns in the data, statistical and machine learning methods must be applied. The work presented in this thesis demonstrates applications of probabilistic generative modelling, with particular focus in Latent Dirichlet Allocation (LDA), as a tool for information recovery in 1H NMR data sets obtained in metabolomics research. LDA is an example of a topic model. The model is based on a generative process which can be thought of as a source of the data. Topics are latent variables which select co-occurring metabolites in a sample. In turn, NMR spectra can be represented in the latent variable space. We present applications of LDA in three scenarios. (1) How LDA can be used to simulate NMR spectra; such spectra demonstrate that LDA is a valid model for NMR data and also provide synthetic data for evaluation of statistical models. (2) Unsupervised learning with LDA to uncover patterns in the NMR data; we use synthetics and real NMR data with knowledge of key biomarkers from a prior study and conclude that LDA was successful in the recovery of useful topics. (3) Supervised learning with SLDA and combined latent variable models with ElasticNet regression where we investigate NMR data from The Multi-Ethnic Study of Atherosclerosis (MESA) study which is paired with clinical variables such as BMI. The goal was to examine if topics can be informative about clinical outcomes.Open Acces

    Differential human gut microbiome assemblages during soil-transmitted helminth infections in Indonesia and Liberia.

    Get PDF
    BACKGROUND: The human intestine and its microbiota is the most common infection site for soil-transmitted helminths (STHs), which affect the well-being of ~ 1.5 billion people worldwide. The complex cross-kingdom interactions are not well understood. RESULTS: A cross-sectional analysis identified conserved microbial signatures positively or negatively associated with STH infections across Liberia and Indonesia, and longitudinal samples analysis from a double-blind randomized trial showed that the gut microbiota responds to deworming but does not transition closer to the uninfected state. The microbiomes of individuals able to self-clear the infection had more alike microbiome assemblages compared to individuals who remained infected. One bacterial taxon (Lachnospiracae) was negatively associated with infection in both countries, and 12 bacterial taxa were significantly associated with STH infection in both countries, including Olsenella (associated with reduced gut inflammation), which also significantly reduced in abundance following clearance of infection. Microbial community gene abundances were also affected by deworming. Functional categories identified as associated with STH infection included arachidonic acid metabolism; arachidonic acid is the precursor for pro-inflammatory leukotrienes that threaten helminth survival, and our findings suggest that some modulation of arachidonic acid activity in the STH-infected gut may occur through the increase of arachidonic acid metabolizing bacteria. CONCLUSIONS: For the first time, we identify specific members of the gut microbiome that discriminate between moderately/heavily STH-infected and non-infected states across very diverse geographical regions using two different statistical methods. We also identify microbiome-encoded biological functions associated with the STH infections, which are associated potentially with STH survival strategies, and changes in the host environment. These results provide a novel insight of the cross-kingdom interactions in the human gut ecosystem by unlocking the microbiome assemblages at taxonomic, genetic, and functional levels so that advances towards key mechanistic studies can be made. Microbiome 2018 Feb 28; 6(1):33
    • …
    corecore