133 research outputs found

    Discovering cancer-associated transcripts by RNA sequencing

    Full text link
    High-throughput sequencing of poly-adenylated RNA (RNA-Seq) in human cancers shows remarkable potential to identify uncharacterized aspects of tumor biology, including gene fusions with therapeutic significance and disease markers such as long non-coding RNA (lncRNA) species. However, the analysis of RNA-Seq data places unprecedented demands upon computational infrastructures and algorithms, requiring novel bioinformatics approaches. To meet these demands, we present two new open-source software packages - ChimeraScan and AssemblyLine - designed to detect gene fusion events and novel lncRNAs, respectively. RNA-Seq studies utilizing ChimeraScan led to discoveries of new families of recurrent gene fusions in breast cancers and solitary fibrous tumors. Further, ChimeraScan was one of the key components of the repertoire of computational tools utilized in data analysis for MI-ONCOSEQ, a clinical sequencing initiative to identify potentially informative and actionable mutations in cancer patients’ tumors. AssemblyLine, by contrast, reassembles RNA sequencing data into full-length transcripts ab initio. In head-to-head analyses AssemblyLine compared favorably to existing ab initio approaches and unveiled abundant novel lncRNAs, including antisense and intronic lncRNAs disregarded by previous studies. Moreover, we used AssemblyLine to define the prostate cancer transcriptome from a large patient cohort and discovered myriad lncRNAs, including 121 prostate cancer-associated transcripts (PCATs) that could potentially serve as novel disease markers. Functional studies of two PCATs - PCAT-1 and SChLAP1 - revealed cancer-promoting roles for these lncRNAs. PCAT1, a lncRNA expressed from chromosome 8q24, promotes cell proliferation and represses the tumor suppressor BRCA2. SChLAP1, located in a chromosome 2q31 ‘gene desert’, independently predicts poor patient outcomes, including metastasis and cancer-specific mortality. Mechanistically, SChLAP1 antagonizes the genome-wide localization and regulatory functions of the SWI/SNF chromatin-modifying complex. Collectively, this work demonstrates the utility of ChimeraScan and AssemblyLine as open-source bioinformatics tools. Our applications of ChimeraScan and AssemblyLine led to the discovery of new classes of recurrent and clinically informative gene fusions, and established a prominent role for lncRNAs in coordinating aggressive prostate cancer, respectively. We expect that the methods and findings described herein will establish a precedent for RNA-Seq-based studies in cancer biology and assist the research community at large in making similar discoveries.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120814/1/mkiyer_1.pd

    Whole-genome doubling drives oncogenic loss of chromatin segregation.

    Get PDF
    Whole-genome doubling (WGD) is a recurrent event in human cancers and it promotes chromosomal instability and acquisition of aneuploidies <sup>1-8</sup> . However, the three-dimensional organization of chromatin in WGD cells and its contribution to oncogenic phenotypes are currently unknown. Here we show that in p53-deficient cells, WGD induces loss of chromatin segregation (LCS). This event is characterized by reduced segregation between short and long chromosomes, A and B subcompartments and adjacent chromatin domains. LCS is driven by the downregulation of CTCF and H3K9me3 in cells that bypassed activation of the tetraploid checkpoint. Longitudinal analyses revealed that LCS primes genomic regions for subcompartment repositioning in WGD cells. This results in chromatin and epigenetic changes associated with oncogene activation in tumours ensuing from WGD cells. Notably, subcompartment repositioning events were largely independent of chromosomal alterations, which indicates that these were complementary mechanisms contributing to tumour development and progression. Overall, LCS initiates chromatin conformation changes that ultimately result in oncogenic epigenetic and transcriptional modifications, which suggests that chromatin evolution is a hallmark of WGD-driven cancer

    Construction of a duck whole genome radiation hybrid panel : an aid for NGS whole genome assembly and a contribution to avian comparative maps

    Get PDF
    Le canard est une espèce d'importance agronomique en France, principalement à travers l'industrie de foie gras, qui représente plus de 75% de la production mondiale. De plus, c'est aussi un modèle important pour l'étude de l'infection par le virus influenza, pour lequel les oiseaux aquatiques sont un réservoir naturel, car porteurs asymptomatiques. Les travaux réalisés lors de la thèse se situent dans le contexte international de l'étude du génome du canard, comportant la séquence du génome, le séquençage d'EST et l'identification et la cartographie de SNP. Le but à terme pour l'INRA étant de disposer des connaissances sur le génome nécessaires pour la cartographie fine de QTL et l'identification de gènes impliqués dans l'expression de caractères agronomiques. Un panel de 90 d'hybrides irradiés (panel RH) a été réalisé par fusion de cellules donneuses de canard irradiées avec des cellules receveuses de hamster. Afin d'éviter la culture à grande échelle des clones cellulaires, des méthodes de génotypage par PCR utilisant l'amplification complète du génome (WGA) et/ou la réduction des volumes réactionnels ont été testées et deux premières cartes de chromosomes ont ainsi été réalisées. Nous avons également utilisé le génotypage par PCR pour vérifier la qualité de l'assemblage des scaffolds du génome du canard, réalisés par séquençage nouvelle génération Illumina au Beijing Genome Institute (BGI, Chine). Finalement, afin de couvrir le génome complet, nous avons entrepris un séquençage léger (0,1X de profondeur) d'hybrides, permettant une réalisation de cartes plus rapides que par PCR. Ces cartes permettent la détection des réarrangements chromosomiques existant entre les génomes de la poule et du canard, qui sont distants de 80 millions d'années.Duck is a very important agronomic species in France, especially for fatty liver industry which presents 75% worldwide production. Moreover, duck is also a scientific model for avian influenza research as it is a natural reservoir for avian influenza viruses. The work presented here is part of the international collaboration on duck genome sequencing, including SNP detection and mapping, EST sequencing. Our goal is to provide a genome map allowing for fine mapping QTL and identifying candidate genes involved in expression of agronomic traits. A panel composed of 90 radiation hybrids was produced by fusing irradiated duck donor cells with hamster cells. To avoid large-scale culture of the clones, PCR genotyping involving Whole Genome Amplification (WGA) and/or reduction of reaction volumes were tested and two first maps for duck chromosomes were made. We also used the PCR genotyping method to test for the quality of duck sequence scaffold assemblies, which had been produced by the Beijing Genome Institute (BGI, China). Finally, to cover the whole genome, we performed a low-pass sequencing (0.1X depth) of hybrids, allowing for rapid map development. These maps allow the detection of chromosomal rearrangements that have taken place between the duck and chicken genomes, which have diverged 80 million years ago

    Increased mutation and gene conversion within human segmental duplications

    Get PDF
    Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have ‘relocated’ on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.We thank T. Brown for help in editing this manuscript, P. Green for valuable suggestions, and R. Seroussi and his staff for their generous donation of time and resources. This work was supported in part by grants from the US National Institutes of Health (NIH 5R01HG002385, 5U01HG010971 and 1U01HG010973 to E.E.E.; K99HG011041 to P.H.; and F31AI150163 to W.S.D.). W.S.D. was supported in part by a Fellowship in Understanding Dynamic and Multi-scale Systems from the James S. McDonnell Foundation. E.E.E. is an investigator of the Howard Hughes Medical Institute (HHMI). This article is subject to HHMI’s Open Access to Publications policy. HHMI laboratory heads have previously granted a nonexclusive CC BY 4.0 licence to the public and a sublicensable licence to HHMI in their research articles. Pursuant to those licences, the author-accepted manuscript of this article can be made freely available under a CC BY 4.0 licence immediately on publication.Peer Reviewed"Article signat per 19 autors/es: Mitchell R. Vollger, Philip C. Dishuck, William T. Harvey, William S. DeWitt, Xavi Guitart, Michael E. Goldberg, Allison N. Rozanski, Julian Lucas, Mobin Asri, Human Pangenome Reference Consortium, Katherine M. Munson, Alexandra P. Lewis, Kendra Hoekzema, Glennis A. Logsdon, David Porubsky, Benedict Paten, Kelley Harris, PingHsun Hsieh & Evan E. Eichler"Postprint (published version

    Initial sequencing and analysis of the human genome

    Full text link
    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62798/1/409860a0.pd

    A draft human pangenome reference

    Get PDF
    Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individual
    corecore