118 research outputs found

    Butler enables rapid cloud-based analysis of thousands of human genomes.

    Get PDF
    We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner

    Genomic Variation and Its Impact on Gene Expression in Drosophila melanogaster

    Get PDF
    Understanding the relationship between genetic and phenotypic variation is one of the great outstanding challenges in biology. To meet this challenge, comprehensive genomic variation maps of human as well as of model organism populations are required. Here, we present a nucleotide resolution catalog of single-nucleotide, multi-nucleotide, and structural variants in 39 Drosophila melanogaster Genetic Reference Panel inbred lines. Using an integrative, local assembly-based approach for variant discovery, we identify more than 3.6 million distinct variants, among which were more than 800,000 unique insertions, deletions (indels), and complex variants (1 to 6,000 bp). While the SNP density is higher near other variants, we find that variants themselves are not mutagenic, nor are regions with high variant density particularly mutation-prone. Rather, our data suggest that the elevated SNP density around variants is mainly due to population-level processes. We also provide insights into the regulatory architecture of gene expression variation in adult flies by mapping cis-expression quantitative trait loci (cis-eQTLs) for more than 2,000 genes. Indels comprise around 10% of all cis-eQTLs and show larger effects than SNP cis-eQTLs. In addition, we identified two-fold more gene associations in males as compared to females and found that most cis-eQTLs are sex-specific, revealing a partial decoupling of the genomic architecture between the sexes as well as the importance of genetic factors in mediating sex-biased gene expression. Finally, we performed RNA-seq-based allelic expression imbalance analyses in the offspring of crosses between sequenced lines, which revealed that the majority of strong cis-eQTLs can be validated in heterozygous individuals

    Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data

    Get PDF
    Motivation: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. Availability: The R package absfilter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    A yeast one-hybrid and microfluidics-based pipeline to map mammalian gene regulatory networks

    Get PDF
    The comprehensive mapping of gene promoters and enhancers has significantly improved our understanding of how the mammalian regulatory genome is organized. An important challenge is to elucidate how these regulatory elements contribute to gene expression by identifying their trans-regulatory inputs. Here, we present the generation of a mouse-specific transcription factor (TF) open-reading frame clone library and its implementation in yeast one-hybrid assays to enable large-scale protein–DNA interaction detection with mouse regulatory elements. Once specific interactions are identified, we then use a microfluidics-based method to validate and precisely map them within the respective DNA sequences. Using well-described regulatory elements as well as orphan enhancers, we show that this cross-platform pipeline characterizes known and uncovers many novel TF–DNA interactions. In addition, we provide evidence that several of these novel interactions are relevant in vivo and aid in elucidating the regulatory architecture of enhancers

    A leukemia-protective germline variant mediates chromatin module formation via transcription factor nucleation

    Get PDF
    Non-coding variants coordinate transcription factor (TF) binding and chromatin mark enrichment changes over regions spanning >100 kb. These molecularly coordinated regions are named "variable chromatin modules" (VCMs), providing a conceptual framework of how regulatory variation might shape complex traits. To better understand the molecular mechanisms underlying VCM formation, here, we mechanistically dissect a VCM-modulating noncoding variant that is associated with reduced chronic lymphocytic leukemia (CLL) predisposition and disease progression. This common, germline variant constitutes a 5-bp indel that controls the activity of an AXIN2 gene-linked VCM by creating a MEF2 binding site, which, upon binding, activates a super-enhancer-like regulatory element. This triggers a large change in TF binding activity and chromatin state at an enhancer cluster spanning >150 kb, coinciding with subtle, long-range chromatin compaction and robust AXIN2 up-regulation. Our results support a model in which the indel acts as an AXIN2 VCM-activating TF nucleation event, which modulates CLL pathology

    Cancer risk and tumour spectrum in 172 patients with a germline SUFU pathogenic variation : a collaborative study of the SIOPE Host Genome Working Group

    Get PDF
    Background Little is known about risks associated with germline SUFU pathogenic variants (PVs) known as a cancer predisposition syndrome. Methods To study tumour risks, we have analysed data of a large cohort of 45 unpublished patients with a germline SUFU PV completed with 127 previously published patients. To reduce the ascertainment bias due to index patient selection, the risk of tumours was evaluated in relatives with SUFU PV (89 patients) using the Nelson-Aalen estimator. Results Overall, 117/172 (68%) SUFU PV carriers developed at least one tumour: medulloblastoma (MB) (86 patients), basal cell carcinoma (BCC) (25 patients), meningioma (20 patients) and gonadal tumours (11 patients). Thirty-three of them (28%) had multiple tumours. Median age at diagnosis of MB, gonadal tumour, first BCC and first meningioma were 1.5, 14, 40 and 44 years, respectively. Follow-up data were available for 160 patients (137 remained alive and 23 died). The cumulative incidence of tumours in relatives was 14.4% (95% CI 6.8 to 21.4), 18.2% (95% CI 9.7 to 25.9) and 44.1% (95% CI 29.7 to 55.5) at the age of 5, 20 and 50 years, respectively. The cumulative risk of an MB, gonadal tumour, BCC and meningioma at age 50 years was: 13.3% (95% CI 6 to 20.1), 4.6% (95% CI 0 to 9.7), 28.5% (95% CI 13.4 to 40.9) and 5.2% (95% CI 0 to 12), respectively. Sixty-four different PVs were reported across the entire SUFU gene and inherited in 73% of cases in which inheritance could be evaluated. Conclusion Germline SUFU PV carriers have a life-long increased risk of tumours with a spectrum dominated by MB before the age of 5, gonadal tumours during adolescence and BCC and meningioma in adulthood, justifying fine-tuned surveillance programmes.Peer reviewe

    Upfront Biology-Guided Therapy in Diffuse Intrinsic Pontine Glioma: Therapeutic, Molecular, and Biomarker Outcomes from PNOC003

    Full text link
    PURPOSE PNOC003 is a multicenter precision medicine trial for children and young adults with newly diagnosed diffuse intrinsic pontine glioma (DIPG). PATIENTS AND METHODS Patients (3-25 years) were enrolled on the basis of imaging consistent with DIPG. Biopsy tissue was collected for whole-exome and mRNA sequencing. After radiotherapy (RT), patients were assigned up to four FDA-approved drugs based on molecular tumor board recommendations. H3K27M-mutant circulating tumor DNA (ctDNA) was longitudinally measured. Tumor tissue and matched primary cell lines were characterized using whole-genome sequencing and DNA methylation profiling. When applicable, results were verified in an independent cohort from the Children's Brain Tumor Network (CBTN). RESULTS Of 38 patients enrolled, 28 patients (median 6 years, 10 females) were reviewed by the molecular tumor board. Of those, 19 followed treatment recommendations. Median overall survival (OS) was 13.1 months [95% confidence interval (CI), 11.2-18.4] with no difference between patients who followed recommendations and those who did not. H3K27M-mutant ctDNA was detected at baseline in 60% of cases tested and associated with response to RT and survival. Eleven cell lines were established, showing 100% fidelity of key somatic driver gene alterations in the primary tumor. In H3K27-altered DIPGs, TP53 mutations were associated with worse OS (TP53mut 11.1 mo; 95% CI, 8.7-14; TP53wt 13.3 mo; 95% CI, 11.8-NA; P = 3.4e-2), genome instability (P = 3.1e-3), and RT resistance (P = 6.4e-4). The CBTN cohort confirmed an association between TP53 mutation status, genome instability, and clinical outcome. CONCLUSIONS Upfront treatment-naïve biopsy provides insight into clinically relevant molecular alterations and prognostic biomarkers for H3K27-altered DIPGs

    Systematic Inference of Copy-Number Genotypes from Personal Genome Sequencing Data Reveals Extensive Olfactory Receptor Gene Content Diversity

    Get PDF
    Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95–99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ∼15% and ∼20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing

    Genomic and transcriptomic changes complement each other in the pathogenesis of sporadic Burkitt lymphoma

    Get PDF
    Burkitt lymphoma (BL) is the most common B-cell lymphoma in children. Within the International Cancer Genome Consortium (ICGC), we performed whole genome and transcriptome sequencing of 39 sporadic BL. Here, we unravel interaction of structural, mutational, and transcriptional changes, which contribute to MYC oncogene dysregulation together with the pathognomonic IG-MYC translocation. Moreover, by mapping IGH translocation breakpoints, we provide evidence that the precursor of at least a subset of BL is a B-cell poised to express IGHA. We describe the landscape of mutations, structural variants, and mutational processes, and identified a series of driver genes in the pathogenesis of BL, which can be targeted by various mechanisms, including IG-non MYC translocations, germline and somatic mutations, fusion transcripts, and alternative splicing
    corecore