3 research outputs found
Recommended from our members
Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data
Abstract: Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the genetic architecture of the trait remains largely unknown. The recent development of machine learning (ML) approaches incited us to apply them to classify healthy and diseased people according to their genomic information. The Immunochip dataset containing 18,227 CD patients and 34,050 healthy controls enrolled and genotyped by the international Inflammatory Bowel Disease genetic consortium (IIBDGC) has been re-analyzed using a set of ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) and artificial neural networks (NN). The main score used to compare the methods was the Area Under the ROC Curve (AUC) statistics. The impact of quality control (QC), imputing and coding methods on LR results showed that QC methods and imputation of missing genotypes may artificially increase the scores. At the opposite, neither the patient/control ratio nor marker preselection or coding strategies significantly affected the results. LR methods, including Lasso, Ridge and ElasticNet provided similar results with a maximum AUC of 0.80. GBT methods like XGBoost, LightGBM and CatBoost, together with dense NN with one or more hidden layers, provided similar AUC values, suggesting limited epistatic effects in the genetic architecture of the trait. ML methods detected near all the genetic variants previously identified by GWAS among the best predictors plus additional predictors with lower effects. The robustness and complementarity of the different methods are also studied. Compared to LR, non-linear models such as GBT or NN may provide robust complementary approaches to identify and classify genetic markers
Human candidate gene polymorphisms and risk of severe malaria in children in Kilifi, Kenya: a case-control association study
Background: Human genetic factors are important determinants of malaria risk. We investigated associations between multiple candidate polymorphisms—many related to the structure or function of red blood cells—and risk for severe Plasmodium falciparum malaria and its specific phenotypes, including cerebral malaria, severe malaria anaemia, and respiratory distress. Methods: We did a case-control study in Kilifi County, Kenya. We recruited as cases children presenting with severe malaria to the high-dependency ward of Kilifi County Hospital. We included as controls infants born in the local community between Aug 1, 2006, and Sept 30, 2010, who were part of a genetics study. We tested for associations between a range of candidate malaria-protective genes and risk for severe malaria and its specific phenotypes. We used a permutation approach to account for multiple comparisons between polymorphisms and severe malaria. We judged p values less than 0·005 significant for the primary analysis of the association between candidate genes and severe malaria. Findings: Between June 11, 1995, and June 12, 2008, 2244 children with severe malaria were recruited to the study, and 3949 infants were included as controls. Overall, 263 (12%) of 2244 children with severe malaria died in hospital, including 196 (16%) of 1233 with cerebral malaria. We investigated 121 polymorphisms in 70 candidate severe malaria-associated genes. We found significant associations between risk for severe malaria overall and polymorphisms in 15 genes or locations, of which most were related to red blood cells: ABO, ATP2B4, ARL14, CD40LG, FREM3, INPP4B, G6PD, HBA (both HBA1 and HBA2), HBB, IL10, LPHN2 (also known as ADGRL2), LOC727982, RPS6KL1, CAND1, and GNAS. Combined, these genetic associations accounted for 5·2% of the variance in risk for developing severe malaria among individuals in the general population. We confirmed established associations between severe malaria and sickle-cell trait (odds ratio [OR] 0·15, 95% CI 0·11–0·20; p=2·61 × 10−58), blood group O (0·74, 0·66–0·82; p=6·26 × 10−8), and –α3·7-thalassaemia (0·83, 0·76–0·90; p=2·06 × 10−6). We also found strong associations between overall risk of severe malaria and polymorphisms in both ATP2B4 (OR 0·76, 95% CI 0·63–0·92; p=0·001) and FREM3 (0·64, 0·53–0·79; p=3·18 × 10−14). The association with FREM3 could be accounted for by linkage disequilibrium with a complex structural mutation within the glycophorin gene region (comprising GYPA, GYPB, and GYPE) that encodes for the rare Dantu blood group antigen. Heterozygosity for Dantu was associated with risk for severe malaria (OR 0·57, 95% CI 0·49–0·68; p=3·22 × 10−11), as was homozygosity (0·26, 0·11–0·62; p=0·002). Interpretation: Both ATP2B4 and the Dantu blood group antigen are associated with the structure and function of red blood cells. ATP2B4 codes for plasma membrane calcium-transporting ATPase 4 (the major calcium pump on red blood cells) and the glycophorins are ligands for parasites to invade red blood cells. Future work should aim at uncovering the mechanisms by which these polymorphisms can result in severe malaria protection and investigate the implications of these associations for wider health. Funding: Wellcome Trust, UK Medical Research Council, European Union, and Foundation for the National Institutes of Health as part of the Bill & Melinda Gates Grand Challenges in Global Health Initiative
Genome-wide and fine-resolution association analysis of malaria in West Africa
We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations