160 research outputs found

    Genomic Variation and Its Impact on Gene Expression in Drosophila melanogaster

    Get PDF
    Understanding the relationship between genetic and phenotypic variation is one of the great outstanding challenges in biology. To meet this challenge, comprehensive genomic variation maps of human as well as of model organism populations are required. Here, we present a nucleotide resolution catalog of single-nucleotide, multi-nucleotide, and structural variants in 39 Drosophila melanogaster Genetic Reference Panel inbred lines. Using an integrative, local assembly-based approach for variant discovery, we identify more than 3.6 million distinct variants, among which were more than 800,000 unique insertions, deletions (indels), and complex variants (1 to 6,000 bp). While the SNP density is higher near other variants, we find that variants themselves are not mutagenic, nor are regions with high variant density particularly mutation-prone. Rather, our data suggest that the elevated SNP density around variants is mainly due to population-level processes. We also provide insights into the regulatory architecture of gene expression variation in adult flies by mapping cis-expression quantitative trait loci (cis-eQTLs) for more than 2,000 genes. Indels comprise around 10% of all cis-eQTLs and show larger effects than SNP cis-eQTLs. In addition, we identified two-fold more gene associations in males as compared to females and found that most cis-eQTLs are sex-specific, revealing a partial decoupling of the genomic architecture between the sexes as well as the importance of genetic factors in mediating sex-biased gene expression. Finally, we performed RNA-seq-based allelic expression imbalance analyses in the offspring of crosses between sequenced lines, which revealed that the majority of strong cis-eQTLs can be validated in heterozygous individuals

    Butler enables rapid cloud-based analysis of thousands of human genomes.

    Get PDF
    We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner

    Integrative Genomics Identifies the Corepressor SMRT as a Gatekeeper of Adipogenesis through the Transcription Factors C/EBPβ and KAISO

    Get PDF
    The molecular role of corepressors is poorly understood. Here, we studied the transcriptional function of the corepressor SMRT during terminal adipogenesis. Genome-wide DNA-binding profiling revealed that this corepressor is predominantly located in active chromatin regions and that most distal SMRT binding events are lost after differentiation induction. Promoter-proximal tethering of SMRT in preadipocytes is primarily mediated by KAISO through the conserved TCTCGCGAGA motif. Further characterization revealed that KAISO, similar to SMRT, accelerates the cell cycle and increases fat accumulation upon knockdown, identifying KAISO as an adipogenic repressor that likely modulates the mitotic clonal expansion phase of this process. SMRT-bound promoter-distal sites tend to overlap with C/EBPβ-bound regions, which become occupied by proadipogenic transcription factors after SMRT clearance. This reveals a role for SMRT in masking enhancers from proadipogenic factors in preadipocytes. Finally, we identified SMRT as an adipogenic gatekeeper as it directly fine-tunes transcription of pro- and antiadipogenic genes

    Enterosignatures define common bacterial guilds in the human gut microbiome

    Get PDF
    The human gut microbiome composition is generally in a stable dynamic equilibrium, but it can deteriorate into dysbiotic states detrimental to host health. To disentangle the inherent complexity and capture the ecological spectrum of microbiome variability, we used 5,230 gut metagenomes to characterize signatures of bacteria commonly co-occurring, termed enterosignatures (ESs). We find five generalizable ESs dominated by either Bacteroides, Firmicutes, Prevotella, Bifidobacterium, or Escherichia. This model confirms key ecological characteristics known from previous enterotype concepts, while enabling the detection of gradual shifts in community structures. Temporal analysis implies that the Bacteroides-associated ES is “core” in the resilience of westernized gut microbiomes, while combinations with other ESs often complement the functional spectrum. The model reliably detects atypical gut microbiomes correlated with adverse host health conditions and/or the presence of pathobionts. ESs provide an interpretable and generic model that enables an intuitive characterization of gut microbiome composition in health and disease

    Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data

    Get PDF
    Motivation: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. Availability: The R package absfilter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Assessing the gene regulatory landscape in 1,188 human tumors

    Get PDF
    Cancer is characterised by somatic genetic variation, but the effect of the majority of non-coding somatic variants and the interface with the germline genome are still unknown. We analysed the whole genome and RNA-seq data from 1,188 human cancer patients as provided by the Pan-cancer Analysis of Whole Genomes (PCAWG) project to map cis expression quantitative trait loci of somatic and germline variation and to uncover the causes of allele-specific expression patterns in human cancers. The availability of the first large-scale dataset with both whole genome and gene expression data enabled us to uncover the effects of the non-coding variation on cancer. In addition to confirming known regulatory effects, we identified novel associations between somatic variation and expression dysregulation, in particular in distal regulatory elements. Finally, we uncovered links between somatic mutational signatures and gene expression changes, including TERT and LMO2, and we explained the inherited risk factors in APOBEC-related mutational processes. This work represents the first large-scale assessment of the effects of both germline and somatic genetic variation on gene expression in cancer and creates a valuable resource cataloguing these effects

    A yeast one-hybrid and microfluidics-based pipeline to map mammalian gene regulatory networks

    Get PDF
    The comprehensive mapping of gene promoters and enhancers has significantly improved our understanding of how the mammalian regulatory genome is organized. An important challenge is to elucidate how these regulatory elements contribute to gene expression by identifying their trans-regulatory inputs. Here, we present the generation of a mouse-specific transcription factor (TF) open-reading frame clone library and its implementation in yeast one-hybrid assays to enable large-scale protein–DNA interaction detection with mouse regulatory elements. Once specific interactions are identified, we then use a microfluidics-based method to validate and precisely map them within the respective DNA sequences. Using well-described regulatory elements as well as orphan enhancers, we show that this cross-platform pipeline characterizes known and uncovers many novel TF–DNA interactions. In addition, we provide evidence that several of these novel interactions are relevant in vivo and aid in elucidating the regulatory architecture of enhancers

    A leukemia-protective germline variant mediates chromatin module formation via transcription factor nucleation

    Get PDF
    Non-coding variants coordinate transcription factor (TF) binding and chromatin mark enrichment changes over regions spanning >100 kb. These molecularly coordinated regions are named "variable chromatin modules" (VCMs), providing a conceptual framework of how regulatory variation might shape complex traits. To better understand the molecular mechanisms underlying VCM formation, here, we mechanistically dissect a VCM-modulating noncoding variant that is associated with reduced chronic lymphocytic leukemia (CLL) predisposition and disease progression. This common, germline variant constitutes a 5-bp indel that controls the activity of an AXIN2 gene-linked VCM by creating a MEF2 binding site, which, upon binding, activates a super-enhancer-like regulatory element. This triggers a large change in TF binding activity and chromatin state at an enhancer cluster spanning >150 kb, coinciding with subtle, long-range chromatin compaction and robust AXIN2 up-regulation. Our results support a model in which the indel acts as an AXIN2 VCM-activating TF nucleation event, which modulates CLL pathology
    • …
    corecore