7 research outputs found
High throughput analysis of epistasis in genome-wide association studies with BiForce
Motivation: Gene–gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS. Results: We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case–control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, >4500 individuals) and two disease traits in another (>340 000 SNPs, >1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits. Availability and implementation: The software is free and can be downloaded from http://bioinfo.utu.fi/BiForce/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online
Widespread signatures of recent selection linked to nucleosome positioning in the human lineage
In this study we investigated the strengths and modes of selection associated with nucleosome positioning in the human lineage through the comparison of interspecies and intraspecies rates of divergence. We identify significant evidence for both positive and negative selection linked to human nucleosome positioning for the first time, implicating a widespread and important role for DNA sequence in the location of well-positioned nucleosomes. Selection appears to be acting on particular base substitutions to maintain optimum GC compositions in core and linker regions, with, e.g., unexpectedly elevated rates of C→T substitutions during recent human evolution at linker regions 60–90 bp from the nucleosome dyad but significant depletion of the same substitutions within nucleosome core regions. These patterns are strikingly consistent with the known relationships between genomic sequence composition and nucleosome assembly. By stratifying nucleosomes according to the GC content of their genomic neighborhood, we also show that the strength and direction of selection detected is dictated by local GC content. Intriguingly these signatures of selection are not restricted to nucleosomes in close proximity to exons, suggesting the correct positioning of nucleosomes is not only important in and around coding regions. This analysis provides strong evidence that the genomic sequences associated with nucleosomes are not evolving neutrally, and suggests that underlying DNA sequence is an important factor in nucleosome positioning. Recent signatures of selection linked to genomic features as ubiquitous as the nucleosome have important implications for human genome evolution and disease
The retroviral proteinase active site and the N-terminus of Ddi1 are required for repression of protein secretion
AbstractThe Ddi1 protein of the yeast Saccharomyces cerevisiae is involved in numerous interactions with the ubiquitin system, which may be mediated by its N-terminal ubiquitin like domain and its C-terminal ubiquitin associated domain. Ddi1 also contains a central region with all the features of a retroviral aspartic proteinase, which was shown to be important in cell-cycle control. Here we demonstrate an additional role for this domain, along with the N-terminal region, in protein secretion. These results further substantiate the hypothesis that Ddi1 functions in vivo as a catalytically-active aspartic proteinase
At-admission prediction of mortality and pulmonary embolism in an international cohort of hospitalised patients with COVID-19 using statistical and machine learning methods
By September 2022, more than 600 million cases of SARS-CoV-2 infection have been reported globally, resulting in over 6.5 million deaths. COVID-19 mortality risk estimators are often, however, developed with small unrepresentative samples and with methodological limitations. It is highly important to develop predictive tools for pulmonary embolism (PE) in COVID-19 patients as one of the most severe preventable complications of COVID-19. Early recognition can help provide life-saving targeted anti-coagulation therapy right at admission. Using a dataset of more than 800,000 COVID-19 patients from an international cohort, we propose a cost-sensitive gradient-boosted machine learning model that predicts occurrence of PE and death at admission. Logistic regression, Cox proportional hazards models, and Shapley values were used to identify key predictors for PE and death. Our prediction model had a test AUROC of 75.9% and 74.2%, and sensitivities of 67.5% and 72.7% for PE and all-cause mortality respectively on a highly diverse and held-out test set. The PE prediction model was also evaluated on patients in UK and Spain separately with test results of 74.5% AUROC, 63.5% sensitivity and 78.9% AUROC, 95.7% sensitivity. Age, sex, region of admission, comorbidities (chronic cardiac and pulmonary disease, dementia, diabetes, hypertension, cancer, obesity, smoking), and symptoms (any, confusion, chest pain, fatigue, headache, fever, muscle or joint pain, shortness of breath) were the most important clinical predictors at admission. Age, overall presence of symptoms, shortness of breath, and hypertension were found to be key predictors for PE using our extreme gradient boosted model. This analysis based on the, until now, largest global dataset for this set of problems can inform hospital prioritisation policy and guide long term clinical research and decision-making for COVID-19 patients globally. Our machine learning model developed from an international cohort can serve to better regulate hospital risk prioritisation of at-risk patients. © The Author(s) 2024
Whole-genome sequencing reveals host factors underlying critical COVID-19
Altres ajuts: Department of Health and Social Care (DHSC); Illumina; LifeArc; Medical Research Council (MRC); UKRI; Sepsis Research (the Fiona Elizabeth Agnew Trust); the Intensive Care Society, Wellcome Trust Senior Research Fellowship (223164/Z/21/Z); BBSRC Institute Program Support Grant to the Roslin Institute (BBS/E/D/20002172, BBS/E/D/10002070, BBS/E/D/30002275); UKRI grants (MC_PC_20004, MC_PC_19025, MC_PC_1905, MRNO2995X/1); UK Research and Innovation (MC_PC_20029); the Wellcome PhD training fellowship for clinicians (204979/Z/16/Z); the Edinburgh Clinical Academic Track (ECAT) programme; the National Institute for Health Research, the Wellcome Trust; the MRC; Cancer Research UK; the DHSC; NHS England; the Smilow family; the National Center for Advancing Translational Sciences of the National Institutes of Health (CTSA award number UL1TR001878); the Perelman School of Medicine at the University of Pennsylvania; National Institute on Aging (NIA U01AG009740); the National Institute on Aging (RC2 AG036495, RC4 AG039029); the Common Fund of the Office of the Director of the National Institutes of Health; NCI; NHGRI; NHLBI; NIDA; NIMH; NINDS.Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care or hospitalization after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes-including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)-in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease