334 research outputs found

    Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes

    Get PDF
    Structural variants (SVs) are an important source of human genome diversity, but their functional effects are poorly understood. We mapped 61,668 SVs in 613 individuals from the GTEx project and measured their effects on gene expression. We estimate that common SVs are causal at 2.66% of eQTLs, a 10.5-fold enrichment relative to their abundance in the genome. Duplications and deletions were the most impactful variant types, whereas the contribution of mobile element insertions was small (0.12% of eQTLs, 1.9-fold enriched). Multitissue analysis of eQTLs revealed that gene-altering SVs show more constitutive effects than other variant types, with 62.09% of coding SV-eQTLs active in all tissues with eQTL activity compared with 23.08% of coding SNV- and indel-eQTLs. Noncoding SVs, SNVs and indels show broadly similar patterns. We also identified 539 rare SVs associated with nearby gene expression outliers. Of these, 62.34% are noncoding SVs that affect gene expression but have modest enrichment at regulatory elements, showing that rare noncoding SVs are a major source of gene expression differences but remain difficult to predict from current annotations. Both common and rare SVs often affect the expression of multiple genes: SV-eQTLs affect an average of 1.82 nearby genes, whereas SNV- and indel-eQTLs affect an average of 1.09 genes, and 21.34% of rare expression-altering SVs show effects on two to nine different genes. We also observe significant effects on rare gene expression changes extending 1 Mb from the SV. This provides a mechanism by which individual SVs may have strong or pleiotropic effects on phenotypic variation

    Polygenic burden has broader impact on health, cognition, and socioeconomic outcomes than most rare and high-risk copy number variants

    Get PDF
    Copy number variants (CNVs) are associated with syndromic and severe neurological and psychiatric disorders (SNPDs), such as intellectual disability, epilepsy, schizophrenia, and bipolar disorder. Although considered high-impact, CNVs are also observed in the general population. This presents a diagnostic challenge in evaluating their clinical significance. To estimate the phenotypic differences between CNV carriers and non-carriers regarding general health and well-being, we compared the impact of SNPD-associated CNVs on health, cognition, and socioeconomic phenotypes to the impact of three genome-wide polygenic risk score (PRS) in two Finnish cohorts (FINRISK, n = 23,053 and NFBC1966, n = 4895). The focus was on CNV carriers and PRS extremes who do not have an SNPD diagnosis. We identified high-risk CNVs (DECIPHER CNVs, risk gene deletions, or large [\u3e1 Mb] CNVs) in 744 study participants (2.66%), 36 (4.8%) of whom had a diagnosed SNPD. In the remaining 708 unaffected carriers, we observed lower educational attainment (EA; OR = 0.77 [95% CI 0.66-0.89]) and lower household income (OR = 0.77 [0.66-0.89]). Income-associated CNVs also lowered household income (OR = 0.50 [0.38-0.66]), and CNVs with medical consequences lowered subjective health (OR = 0.48 [0.32-0.72]). The impact of PRSs was broader. At the lowest extreme of PRS for EA, we observed lower EA (OR = 0.31 [0.26-0.37]), lower-income (OR = 0.66 [0.57-0.77]), lower subjective health (OR = 0.72 [0.61-0.83]), and increased mortality (Cox\u27s HR = 1.55 [1.21-1.98]). PRS for intelligence had a similar impact, whereas PRS for schizophrenia did not affect these traits. We conclude that the majority of working-age individuals carrying high-risk CNVs without SNPD diagnosis have a modest impact on morbidity and mortality, as well as the limited impact on income and educational attainment, compared to individuals at the extreme end of common genetic variation. Our findings highlight that the contribution of traditional high-risk variants such as CNVs should be analyzed in a broader genetic context, rather than evaluated in isolation

    Binary Interval Search (BITS): A Scalable Algorithm for Counting Interval Intersections

    Get PDF
    Motivation: The comparison of diverse genomic datasets is fundamental to understanding genome biology. Researchers must explore many large datasets of genome intervals (e.g., genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect: that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features is crucial for future discovery. Results: We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures such as Graphics Processing Units (GPUs) by illustrating its utility for efficient Monte-Carlo simulations measuring the significance of relationships between sets of genomic intervals

    High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

    Get PDF
    The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies

    Epipodial tentacle gene expression and predetermined resilience to summer mortality in the commercially important greenlip abalone, Haliotis laevigata

    Get PDF
    "Summer mortality" is a phenomenon that occurs during warm water temperature spikes that results in the mass mortality of many ecologically and economically important mollusks such as abalone. This study aimed to determine whether the baseline gene expression of abalone before a laboratory-induced summer mortality event was associated with resilience to summer mortality. Tentacle transcriptomes of 35 greenlip abalone (Haliotis laevigata) were sequenced prior to the animals being exposed to an increase in water temperature — simulating conditions which have previously resulted in summer mortality. Abalone derived from three source locations with different environmental conditions were categorized as susceptible or resistant to summer mortality depending on whether they died or survived after the water temperature was increased. We detected two genes showing significantly higher expression in resilient abalone relative to susceptible abalone prior to the laboratory-induced summer mortality event. One of these genes was annotated through the NCBI non-redundant protein database using BLASTX to an anemone (Exaiptasia pallida) Transposon Ty3-G Gag Pol polyprotein. Distinct gene expression signatures were also found between resilient and susceptible abalone depending on the population origin, which may suggest divergence in local adaptation mechanisms for resilience. Many of these genes have been suggested to be involved in antioxidant and immune-related functions. The identification of these genes and their functional roles have enhanced our understanding of processes that may contribute to summer mortality in abalone. Our study supports the hypothesis that prestress gene expression signatures are indicative of the likelihood of summer mortality
    • …
    corecore