4 research outputs found

    Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP

    Get PDF
    In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the "black box" paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.This study was funded by Fundación Mundo Sano and Instituto de Salud Carlos III. The funders had no roles in the design of the study or collection, analysis and interpretation of the data. C.M.S. and M.N.C. had a PhD scholarship from Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET).S

    Comparative whole-genome sequence analysis of Mycobacterium tuberculosis isolated from pulmonary tuberculosis and tuberculous lymphadenitis patients in Northwest Ethiopia

    Get PDF
    Background: Tuberculosis (TB), caused by the Mycobacterium tuberculosis complex (MTBC), is a chronic infectious disease with both pulmonary and extrapulmonary forms. This study set out to investigate and compare the genomic diversity and transmission dynamics of Mycobacterium tuberculosis (Mtb) isolates obtained from tuberculous lymphadenitis (TBLN) and pulmonary TB (PTB) cases in Northwest Ethiopia. Methods: A facility-based cross-sectional study was conducted using two groups of samples collected between February 2021 and June 2022 (Group 1) and between June 2020 and June 2022 (Group 2) in Northwest Ethiopia. Deoxyribonucleic acid (DNA) was extracted from 200 heat-inactivated Mtb isolates. Whole-genome sequencing (WGS) was performed from 161 isolates having ≥1 ng DNA/μl using Illumina NovaSeq 6000 technology. Results: From the total 161 isolates sequenced, 146 Mtb isolates were successfully genotyped into three lineages (L) and 18 sub-lineages. The Euro-American (EA, L4) lineage was the prevailing (n = 100; 68.5%) followed by Central Asian (CAS, L3, n = 43; 25.3%) and then L7 (n = 3; 2.05%). The L4.2.2.ETH sub-lineage accounted for 19.9%, while Haarlem estimated at 13.7%. The phylogenetic tree revealed distinct Mtb clusters between PTB and TBLN isolates even though there was no difference at lineages and sub-lineages levels. The clustering rate (CR) and recent transmission index (RTI) for PTB were 30 and 15%, respectively. Similarly, the CR and RTI for TBLN were 31.1 and 18 %, respectively. Conclusion and recommendations: PTB and TBLN isolates showed no Mtb lineages and sub-lineages difference. However, at the threshold of five allelic distances, Mtb isolates obtained from PTB and TBLN form distinct complexes in the phylogenetic tree, which indicates the presence of Mtb genomic variation among the two clinical forms. The high rate of clustering and RTI among TBLN implied that TBLN was likely the result of recent transmission and/or reactivation from short latency. Hence, the high incidence rate of TBLN in the Amhara region could be the result of Mtb genomic diversity and rapid clinical progression from primary infection and/or short latency. To validate this conclusion, a similar community-based study with a large sample size and better sampling technique is highly desirable. Additionally, analysis of genomic variants other than phylogenetic informative regions could give insightful information. Combined analysis of the host and the pathogen genome (GXG) together with environmental (GxGxE) factors could give comprehensive co-evolutionary information.The sample collection was funded by the Institute of Biotechnology, Bahir Dar University through the EN mega project. The Mtb culture and identification-related lab supply were supported by Amhara Public Health Institute, Bahir Dar Ethiopia. The whole-genome sequencing (WGS) and publication fee was covered by the National Center of Microbiology, Institute of Carlos III, Madrid, Spain. International Federation for Clinical Chemistry (IFCC) gave financial support to DM through the IFCC Professional Scientific Exchange Programme (PSEP) for 3-month WGS laboratory work.S

    Mycobacterium tuberculosis Sub-Lineage 4.2.2/SIT149 as Dominant Drug-Resistant Clade in Northwest Ethiopia 2020-2022: In-silico Whole-Genome Sequence Analysis

    Get PDF
    Introduction: Drug resistance (DR) in Mycobacterium tuberculosis complex (MTBC) is mainly associated with certain lineages and varies across regions and countries. The Beijing genotype is the leading resistant lineage in Asia and western countries. M. tuberculosis (Mtb) (sub) lineages responsible for most drug resistance in Ethiopia are not well described. Hence, this study aimed to identify the leading drug resistance sub-lineages and characterize first-line anti-tuberculosis drug resistance-associated single nucleotide polymorphisms (SNPs). Methods: A facility-based cross-sectional study was conducted in 2020-2022 among new and presumptive multidrug resistant-TB (MDR-TB) cases in Northwest Ethiopia. Whole-genome sequencing (WGS) was performed on 161 isolates using Illumina NovaSeq 6000 technology. The SNP mutations associated with drug resistance were identified using MtbSeq and TB profiler Bioinformatics softwares. Results: Of the 146 Mtb isolates that were successfully genotyped, 20 (13.7%) harbored one or more resistance-associated SNPs. L4.2.2.ETH was the leading drug-resistant sub-lineage, accounting for 10/20 (50%) of the resistant Mtb. MDR-TB isolates showed extensive mutations against first-line anti-TB drugs. Ser450Leu/(tcg/tTg) for Rifampicin (RIF), Ser315Thr/(agc/aCc) for Isoniazid (INH), Met306Ile/(atg/atA(C)) for Ethambutol (EMB), and Gly69Asp for Streptomycin (STR) were the leading resistance associated mutations which accounted for 56.5%, 89.5%, 47%, and 29.4%, respectively. The presence of both clustered and non-clustered drug resistance (DR) isolates indicated that the epidemics is driven by both new DR development and acquired resistance. Conclusion: The high prevalence of drug-resistant TB due to geographically restricted sub-lineages (L4.2.2.ETH) indicates the ongoing local micro epidemics. The Mtb drug resistance surveillance system must be improved. Further evolutionary analysis of L4.2.2.ETH strain is highly desirable to understand evolutionary forces that leads L4.2.2.ETH in to high level DR and transmissible sub-lineage.Sample collection was funded by the Institute of Biotechnology, Bahir Dar University, through the Endalkachew Nibret Mega Project. The Mtb culture and identification-related laboratory supply was supported by the Amhara Public Health Institute, Bahir Dar, Ethiopia. Whole-genome sequencing was performed with the great support of National Center of Microbiology, Institute of Carlos III, Madrid, Spain. The International Federation for Clinical Chemistry (IFCC) provided financial support to Daniel Mekonnen through the IFCC Professional Exchange Program (PEP) for three months stay in Madrid Spain for conducting the WGS analysis.S

    Environmental characteristics around the household and their association with hookworm infection in rural communities from Bahir Dar, Amhara Region, Ethiopia

    Get PDF
    Soil-Transmitted Helminths (STH) are highly prevalent Neglected Tropical Disease in Ethiopia, an estimated 26 million are infected. Geographic Information Systems and Remote Sensing (RS) technologies assist data mapping and analysis, and the prediction of the spatial distribution of infection in relation to environmental variables. The influence of socioeconomic, environmental and soil characteristics on hookworm infection at the individual and household level is explored in order to identify spatial patterns of infection in rural villages from Zenzelema (Amhara region). Inhabitants greater than 5 years old were recruited in order to assess the presence of STH. Socioeconomic and hookworm infection variables at the household level and environmental variables and soil characteristics using RS were obtained. The dominant STH found was hookworm. Individuals which practiced open defecation and those without electricity had a significant higher number of hookworm eggs in their stool. Additionally, adults showed statistically higher hookworm egg counts than children. Nonetheless, the probability of hookworm infection was not determined by socioeconomic conditions but by environmental characteristics surrounding the households, including a combination of vigorous vegetation and bare soil, high temperatures, and compacted soils (high bulk density) with more acidic pH, given a pH of 6.0 is optimal for hatching of hookworm eggs. The identification of high-risk environmental areas provides a useful tool for planning, targeting and monitoring of control measures, including not only children but also adults when hookworm is concerned.Fil: Anegagrie, Melaku. Fundación Mundo Sano; España. Instituto de Salud Carlos III; EspañaFil: Lanfri, Sofía. Fundación Mundo Sano; Argentina. Comisión Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaFil: Aramendia, Aranzazu Amor. Fundación Mundo Sano; España. Instituto de Salud Carlos III; EspañaFil: Scavuzzo, Carlos Matias. Fundación Mundo Sano; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaFil: Herrador, Zaida. Instituto de Salud Carlos III; EspañaFil: Benito, Agustín. Instituto de Salud Carlos III; EspañaFil: Periago, Maria Victoria. Fundación Mundo Sano; Argentin
    corecore