15 research outputs found

    Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis.

    Get PDF
    Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis

    Localized hypermutation and hypomutation in the genomes of human somatic cells

    Full text link
    [eng] Somatic cells accumulate mutations in their genome from a set of exogenous and endogenous processes. The interplay of DNA lesions and the DNA repair mechanisms in each cell shape the genetic mosaicism that composes an adult tissue. Although many of these alterations have a neutral effect, some can eventually impede the correct physiological function of the tissue, causing cancer, and other diseases such as clonal hematopoiesis and repeat expansion disorders. Understanding the molecular mechanisms of how these somatic mutations are generated can then help in the prevention and treatment of such diseases, and can help understand DNA replication and repair operative in human cells. In this thesis, we explore somatic mutation distributions from several perspectives, focusing on the genomic features that modulate the local rate at which they accumulate. First, we systematically study the mechanisms that generate APOBEC mutations in tumor samples; we describe a new mechanism of diffuse mutation clusters that are enriched in gene rich domains of the human genome, consistent with a DNA repair-mediated mutagenesis. Next, we study the somatic mutation signatures in a wide range of human healthy tissues and compare them with their corresponding cancer types, reporting broad similarities. We also study the sub-gene resolution heterogeneity in mutation rates, revealing that gradient of mutation rate along the gene body and its interaction with other functional elements like promoters, enhancers, and loop anchors. Lastly, we detect and characterize distal mutation clusters in trans-interacting chromatin loci, which suggests three-dimensional-acting mutagenesis mechanisms active in human cells. Overall, studies in this thesis highlight the variable accumulation of the endogenous sources of DNA mutations along the human genome, elucidates mechanisms and impact on accruing mutations in functional elements.[cat] Les cèl·lules somàtiques acumulen mutacions en el seu genoma a partir d'un conjunt de processos exògens i endògens. La interacció de les lesions d'ADN i els mecanismes de reparació de l'ADN de cada cèl·lula configuren el mosaicisme genètic que compon un teixit adult. Tot i que moltes d'aquestes alteracions tenen un efecte neutre, algunes poden eventualment impedir la correcta funció fisiològica del teixit, provocant càncer i altres malalties com l'hematopoiesi clonal i les malalties d'expansió de seqüències repetida. Comprendre els mecanismes moleculars de com es generen aquestes mutacions somàtiques pot ajudar a la prevenció i el tractament d'aquestes malalties, i pot ajudar a comprendre la replicació i reparació de l'ADN a les cèl·lules humanes. En aquesta tesi, explorem les distribucions de mutacions somàtiques des de diverses perspectives, centrant-nos en les característiques genòmiques que modulen la taxa local a la qual s'acumulen. En primer lloc, estudiem sistemàticament els mecanismes que generen mutacions derivades de l’activitat dels enzims APOBEC en mostres tumorals; descrivim un nou mecanisme de cúmuls de mutació difusa que s'enriqueixen en dominis genòmics rics en gens, aquest sistema es compatible amb una mutagènesi mediada per la reparació de l'ADN. A continuació, estudiem els patrons de mutació somàtica en una àmplia gamma de teixits humans sans i les comparem amb els seus tipus de càncer corresponents, detectant grans similituds. També estudiem l'heterogeneïtat de la resolució de subgens en les taxes de mutació, revelant un gradient a la taxa de mutació al llarg del cos del gen i la seva interacció amb altres elements funcionals com promotors, potenciadors i llaços d’ancoratge de la cromatina. Finalment, detectem i caracteritzem cúmuls de mutacions distals en loci de cromatina que interaccionen trans, cosa que suggereix mecanismes de mutagènesi d'acció tridimensional actius a les cèl·lules humanes. En conjunt, els estudis d'aquesta tesi posen de manifest l'acumulació variable de les fonts endògenes de mutacions de l'ADN al llarg del genoma humà, dilucida els mecanismes de com s’originen i remarca l'impacte en l'acumulació de mutacions en els elements funcionals

    Passenger mutations accurately classify human tumors.

    No full text
    Determining the cancer type and molecular subtype has important clinical implications. The primary site is however unknown for some malignancies discovered in the metastatic stage. Moreover liquid biopsies may be used to screen for tumoral DNA, which upon detection needs to be assigned to a site-of-origin. Classifiers based on genomic features are a promising approach to prioritize the tumor anatomical site, type and subtype. We examined the predictive ability of causal (driver) somatic mutations in this task, comparing it against global patterns of non-selected (passenger) mutations, including features based on regional mutation density (RMD). In the task of distinguishing 18 cancer types, the driver mutations-mutated oncogenes or tumor suppressors, pathways and hotspots-classified 36% of the patients to the correct cancer type. In contrast, the features based on passenger mutations did so at 92% accuracy, with similar contribution from the RMD and the trinucleotide mutation spectra. The RMD and the spectra covered distinct sets of patients with predictions. In particular, introducing the RMD features into a combined classification model increased the fraction of diagnosed patients by 50 percentage points (at 20% FDR). Furthermore, RMD was able to discriminate molecular subtypes and/or anatomical site of six major cancers. The advantage of passenger mutations was upheld under high rates of false negative mutation calls and with exome sequencing, even though overall accuracy decreased. We suggest whole genome sequencing is valuable for classifying tumors because it captures global patterns emanating from mutational processes, which are informative of the underlying tumor biology

    LncATLAS database for subcellular localization of long noncoding RNAs

    Get PDF
    The subcellular localization of long noncoding RNAs (lncRNAs) holds valuable clues to their molecular function. However, measuring localization of newly discovered lncRNAs involves time-consuming and costly experimental methods. We have created "lncATLAS," a comprehensive resource of lncRNA localization in human cells based on RNA-sequencing data sets. Altogether, 6768 GENCODE-annotated lncRNAs are represented across various compartments of 15 cell lines. We introduce relative concentration index (RCI) as a useful measure of localization derived from ensemble RNA-seq measurements. LncATLAS is accessible through an intuitive and informative webserver, from which lncRNAs of interest are accessed using identifiers or names. Localization is presented across cell types and organelles, and may be compared to the distribution of all other genes. Publication-quality figures and raw data tables are automatically generated with each query, and the entire data set is also available to download. LncATLAS makes lncRNA subcellular localization data available to the widest possible number of researchers. It is available at lncatlas.crg.eu

    Unique genomic features and deeply-conserved functions of long non-coding RNAs in the Cancer LncRNA Census (CLC)

    Get PDF
    Long non-coding RNAs (lncRNAs) that drive tumorigenesis are a growing focus of cancer genomics studies. To facilitate further discovery, we have created the “Cancer LncRNA Census” (CLC), a manually-curated and strictly-defined compilation of lncRNAs with causative roles in cancer. CLC has two principle applications: first, as a resource for training and benchmarking de novo identification methods; and second, as a dataset for studying the fundamental properties of these genes. CLC Version 1 comprises 122 lncRNAs implicated in 29 distinct cancers. LncRNAs are included based on functional or genetic evidence for causative roles in cancer progression. All belong to the GENCODE reference annotation, to enable integration across projects and datasets. For each entry, the evidence type, biological activity (oncogene or tumour suppressor), source reference and cancer type are recorded. Supporting its usefulness, CLC genes are significantly enriched amongst de novo predicted driver genes from PCAWG. CLC genes are distinguished from other lncRNAs by a series of features consistent with biological function, including gene length, high expression and sequence conservation of both exons and promoters. We identify a trend for CLC genes to be co-localised with known protein-coding cancer genes along the human genome. Finally, by integrating data from transposon-mutagenesis functional screens, we show that mouse orthologues of CLC genes tend also to be cancer genes. Thus CLC represents a valuable resource for research into long non-coding RNAs in cancer. Their evolutionary and genomic properties have implications for understanding disease mechanisms and point to conserved functions across ~80 million years of evolution

    LncATLAS database for subcellular localization of long noncoding RNAs

    No full text
    The subcellular localization of long noncoding RNAs (lncRNAs) holds valuable clues to their molecular function. However, measuring localization of newly discovered lncRNAs involves time-consuming and costly experimental methods. We have created "lncATLAS," a comprehensive resource of lncRNA localization in human cells based on RNA-sequencing data sets. Altogether, 6768 GENCODE-annotated lncRNAs are represented across various compartments of 15 cell lines. We introduce relative concentration index (RCI) as a useful measure of localization derived from ensemble RNA-seq measurements. LncATLAS is accessible through an intuitive and informative webserver, from which lncRNAs of interest are accessed using identifiers or names. Localization is presented across cell types and organelles, and may be compared to the distribution of all other genes. Publication-quality figures and raw data tables are automatically generated with each query, and the entire data set is also available to download. LncATLAS makes lncRNA subcellular localization data available to the widest possible number of researchers. It is available at lncatlas.crg.eu.We also acknowledge the support of the Spanish Ministry of Economy and Competitiveness, “Centro de Excelencia Severo Ochoa 2013-2017,” SEV-2012-0208. R.J. was supported by Ramón y Cajal RYC-2011-08851. This research was partly supported by the NCCR “RNA & Disease” funded by the Swiss National Science Foundation

    Whole genome DNA sequencing provides an atlas of somatic mutagenesis in healthy human cells and identifies a tumor-prone cell type

    No full text
    Background: The lifelong accumulation of somatic mutations underlies age-related phenotypes and cancer. Mutagenic forces are thought to shape the genome of aging cells in a tissue-specific way. Whole genome analyses of somatic mutation patterns, based on both types and genomic distribution of variants, can shed light on specific processes active in different human tissues and their effect on the transition to cancer. Results: To analyze somatic mutation patterns, we compile a comprehensive genetic atlas of somatic mutations in healthy human cells. High-confidence variants are obtained from newly generated and publicly available whole genome DNA sequencing data from single non-cancer cells, clonally expanded in vitro. To enable a well-controlled comparison of different cell types, we obtain single genome data (92% mean coverage) from multi-organ biopsies from the same donors. These data show multiple cell types that are protected from mutagens and display a stereotyped mutation profile, despite their origin from different tissues. Conversely, the same tissue harbors cells with distinct mutation profiles associated to different differentiation states. Analyses of mutation rate in the coding and non-coding portions of the genome identify a cell type bearing a unique mutation pattern characterized by mutation enrichment in active chromatin, regulatory, and transcribed regions. Conclusions: Our analysis of normal cells from healthy donors identifies a somatic mutation landscape that enhances the risk of tumor transformation in a specific cell population from the kidney proximal tubule. This unique pattern is characterized by high rate of mutation accumulation during adult life and specific targeting of expressed genes and regulatory regions
    corecore