138 research outputs found
BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues
Abstract
Background
Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Imputation of methylation values at low-coverage sites may mitigate these biases while also identifying important genomic features associated with predictive power.
Results
Here we describe BoostMe, a method for imputing low-quality DNA methylation estimates within whole-genome bisulfite sequencing (WGBS) data. BoostMe uses a gradient boosting algorithm, XGBoost, and leverages information from multiple samples for prediction. We find that BoostMe outperforms existing algorithms in speed and accuracy when applied to WGBS of human tissues. Furthermore, we show that imputation improves concordance between WGBS and the MethylationEPIC array at low WGBS depth, suggesting improved WGBS accuracy after imputation.
Conclusions
Our findings support the use of BoostMe as a preprocessing step for WGBS analysis.https://deepblue.lib.umich.edu/bitstream/2027.42/143848/1/12864_2018_Article_4766.pd
Modeling islet enhancers using deep learning identifies candidate causal variants at loci associated with T2D and glycemic traits.
Genetic association studies have identified hundreds of independent signals associated with type 2 diabetes (T2D) and related traits. Despite these successes, the identification of specific causal variants underlying a genetic association signal remains challenging. In this study, we describe a deep learning (DL) method to analyze the impact of sequence variants on enhancers. Focusing on pancreatic islets, a T2D relevant tissue, we show that our model learns islet-specific transcription factor (TF) regulatory patterns and can be used to prioritize candidate causal variants. At 101 genetic signals associated with T2D and related glycemic traits where multiple variants occur in linkage disequilibrium, our method nominates a single causal variant for each association signal, including three variants previously shown to alter reporter activity in islet-relevant cell types. For another signal associated with blood glucose levels, we biochemically test all candidate causal variants from statistical fine-mapping using a pancreatic islet beta cell line and show biochemical evidence of allelic effects on TF binding for the model-prioritized variant. To aid in future research, we publicly distribute our model and islet enhancer perturbation scores across ~67 million genetic variants. We anticipate that DL methods like the one presented in this study will enhance the prioritization of candidate causal variants for functional studies
Integrative analysis of gene expression, DNA methylation, physiological traits, and genetic variation in human skeletal muscle
We integrate comeasured gene expression and DNA methylation (DNAme) in 265 human skeletal muscle biopsies from the FUSION study with >7 million genetic variants and eight physiological traits: height, waist, weight, waist-hip ratio, body mass index, fasting serum insulin, fasting plasma glucose, and type 2 diabetes. We find hundreds of genes and DNAme sites associated with fasting insulin, waist, and body mass index, as well as thousands of DNAme sites associated with gene expression (eQTM). We find that controlling for heterogeneity in tissue/muscle fiber type reduces the number of physiological trait associations, and that long-range eQTMs (>1 Mb) are reduced when controlling for tissue/muscle fiber type or latent factors. We map genetic regulators (quantitative trait loci; QTLs) of expression (eQTLs) and DNAme (mQTLs). Using Mendelian randomization (MR) and mediation techniques, we leverage these genetic maps to predict 213 causal relationships between expression and DNAme, approximately two-thirds of which predict methylation to causally influence expression. We use MR to integrate FUSION mQTLs, FUSION eQTLs, and GTEx eQTLs for 48 tissues with genetic associations for 534 diseases and quantitative traits. We identify hundreds of genes and thousands of DNAme sites that may drive the reported disease/quantitative trait genetic associations. We identify 300 gene expression MR associations that are present in both FUSION and GTEx skeletal muscle and that show stronger evidence of MR association in skeletal muscle than other tissues, which may partially reflect differences in power across tissues. As one example, we find that increased RXRA muscle expression may decrease lean tissue mass.Peer reviewe
Genetic regulatory signatures underlying islet gene expression and type 2 diabetes
The majority of genetic variants associated with type 2 diabetes (T2D) are located outside of genes in noncoding regions that may regulate gene expression in disease-relevant tissues, like pancreatic islets. Here, we present the largest integrated analysis to date of high-resolution, high-throughput human islet molecular profiling data to characterize the genome (DNA), epigenome (DNA packaging), and transcriptome (gene expression). We find that T2D genetic variants are enriched in regions of the genome where transcription Regulatory Factor X (RFX) is predicted to bind in an islet-specific manner. Genetic variants that increase T2D risk are predicted to disrupt RFX binding, providing a molecular mechanism to explain how the genome can influence the epigenome, modulating gene expression and ultimately T2D risk
Interactions between genetic variation and cellular environment in skeletal muscle gene expression
From whole organisms to individual cells, responses to environmental conditions are influenced by genetic makeup, where the effect of genetic variation on a trait depends on the environmental context. RNA-sequencing quantifies gene expression as a molecular trait, and is capable of capturing both genetic and environmental effects. In this study, we explore opportunities of using allele-specific expression (ASE) to discover cis-acting genotype-environment interactions (GxE)-genetic effects on gene expression that depend on an environmental condition. Treating 17 common, clinical traits as approximations of the cellular environment of 267 skeletal muscle biopsies, we identify 10 candidate environmental response expression quantitative trait loci (reQTLs) across 6 traits (12 unique gene-environment trait pairs; 10% FDR per trait) including sex, systolic blood pressure, and low-density lipoprotein cholesterol. Although using ASE is in principle a promising approach to detect GxE effects, replication of such signals can be challenging as validation requires harmonization of environmental traits across cohorts and a sufficient sampling of heterozygotes for a transcribed SNP. Comprehensive discovery and replication will require large human transcriptome datasets, or the integration of multiple transcribed SNPs, coupled with standardized clinical phenotyping.Peer reviewe
The Long-Baseline Neutrino Experiment: Exploring Fundamental Symmetries of the Universe
The preponderance of matter over antimatter in the early Universe, the
dynamics of the supernova bursts that produced the heavy elements necessary for
life and whether protons eventually decay --- these mysteries at the forefront
of particle physics and astrophysics are key to understanding the early
evolution of our Universe, its current state and its eventual fate. The
Long-Baseline Neutrino Experiment (LBNE) represents an extensively developed
plan for a world-class experiment dedicated to addressing these questions. LBNE
is conceived around three central components: (1) a new, high-intensity
neutrino source generated from a megawatt-class proton accelerator at Fermi
National Accelerator Laboratory, (2) a near neutrino detector just downstream
of the source, and (3) a massive liquid argon time-projection chamber deployed
as a far detector deep underground at the Sanford Underground Research
Facility. This facility, located at the site of the former Homestake Mine in
Lead, South Dakota, is approximately 1,300 km from the neutrino source at
Fermilab -- a distance (baseline) that delivers optimal sensitivity to neutrino
charge-parity symmetry violation and mass ordering effects. This ambitious yet
cost-effective design incorporates scalability and flexibility and can
accommodate a variety of upgrades and contributions. With its exceptional
combination of experimental configuration, technical capabilities, and
potential for transformative discoveries, LBNE promises to be a vital facility
for the field of particle physics worldwide, providing physicists from around
the globe with opportunities to collaborate in a twenty to thirty year program
of exciting science. In this document we provide a comprehensive overview of
LBNE's scientific objectives, its place in the landscape of neutrino physics
worldwide, the technologies it will incorporate and the capabilities it will
possess.Comment: Major update of previous version. This is the reference document for
LBNE science program and current status. Chapters 1, 3, and 9 provide a
comprehensive overview of LBNE's scientific objectives, its place in the
landscape of neutrino physics worldwide, the technologies it will incorporate
and the capabilities it will possess. 288 pages, 116 figure
Despotism and Risk of Infanticide Influence Grizzly Bear Den-Site Selection
Given documented social dominance and intraspecific predation in bear populations, the ideal despotic distribution model and sex hypothesis of sexual segregation predict adult female grizzly bears (Ursus arctos) will avoid areas occupied by adult males to reduce risk of infanticide. Under ideal despotic distribution, juveniles should similarly avoid adult males to reduce predation risk. Den-site selection and use is an important component of grizzly bear ecology and may be influenced by multiple factors, including risk from conspecifics. To test the role of predation risk and the sex hypothesis of sexual segregation, we compared adult female (n = 142), adult male (n = 36), and juvenile (n = 35) den locations in Denali National Park and Preserve, Alaska, USA. We measured elevation, aspect, slope, and dominant land cover for each den site, and used maximum entropy modeling to determine which variables best predicted den sites. We identified the global model as the best-fitting model for adult female (area under curve (AUC) = 0.926) and elevation as the best predictive variable for adult male (AUC = 0.880) den sites. The model containing land cover and elevation best-predicted juvenile (AUC = 0.841) den sites. Adult females spatially segregated from adult males, with dens characterized by higher elevations ( = 1,412 m, SE = 52) and steeper slopes ( = 21.9°, SE = 1.1) than adult male (elevation:  = 1,209 m, SE = 76; slope:  = 15.6°, SE = 1.9) den sites. Juveniles used a broad range of landscape attributes but did not avoid adult male denning areas. Observed spatial segregation by adult females supports the sex hypothesis of sexual segregation and we suggest is a mechanism to reduce risk of infanticide. Den site selection of adult males is likely related to distribution of food resources during spring
The Amsterdam Declaration on Fungal Nomenclature
The Amsterdam Declaration on Fungal Nomenclature was agreed at an international symposium convened in Amsterdam on 19–20 April 2011 under the auspices of the International Commission on the Taxonomy of Fungi (ICTF). The purpose of the symposium was to address the issue of whether or how the current system of naming pleomorphic fungi should be maintained or changed now that molecular data are routinely available. The issue is urgent as mycologists currently follow different practices, and no consensus was achieved by a Special Committee appointed in 2005 by the International Botanical Congress to advise on the problem. The Declaration recognizes the need for an orderly transitition to a single-name nomenclatural system for all fungi, and to provide mechanisms to protect names that otherwise then become endangered. That is, meaning that priority should be given to the first described name, except where that is a younger name in general use when the first author to select a name of a pleomorphic monophyletic genus is to be followed, and suggests controversial cases are referred to a body, such as the ICTF, which will report to the Committee for Fungi. If appropriate, the ICTF could be mandated to promote the implementation of the Declaration. In addition, but not forming part of the Declaration, are reports of discussions held during the symposium on the governance of the nomenclature of fungi, and the naming of fungi known only from an environmental nucleic acid sequence in particular. Possible amendments to the Draft BioCode (2011) to allow for the needs of mycologists are suggested for further consideration, and a possible example of how a fungus only known from the environment might be described is presented
Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure
Heart failure (HF) is a leading cause of morbidity and mortality worldwide. A small proportion of HF cases are attributable to monogenic cardiomyopathies and existing genome-wide association studies (GWAS) have yielded only limited insights, leaving the observed heritability of HF largely unexplained. We report results from a GWAS meta-analysis of HF comprising 47,309 cases and 930,014 controls. Twelve independent variants at 11 genomic loci are associated with HF, all of which demonstrate one or more associations with coronary artery disease (CAD), atrial fibrillation, or reduced left ventricular function, suggesting shared genetic aetiology. Functional analysis of non-CAD-associated loci implicate genes involved in cardiac development (MYOZ1, SYNPO2L), protein homoeostasis (BAG3), and cellular senescence (CDKN1A). Mendelian randomisation analysis supports causal roles for several HF risk factors, and demonstrates CAD-independent effects for atrial fibrillation, body mass index, and hypertension. These findings extend our knowledge of the pathways underlying HF and may inform new therapeutic strategies
- …