10 research outputs found

    Computational approaches for identifying cancer driver events

    Get PDF
    The transformation of a normal cell into a cancer cell involves the accumulation of somatic DNA alterations that confer growth and survival advantages. These genomic alterations can be different in terms of pattern and size, comprising single nucleotide variants (SNVs), small insertions or deletions (indels), structural variations (SVs) or foreign DNA insertions such as viral DNA. Cancer genomes typically harbor numerous such changes, of which only small fractions are driver events that are positively selected for during the evolution of the tumor. High throughput sequencing has enabled systematic mapping of somatic DNA alterations across thousands of tumor genomes. Mutations in particular have been thoroughly explored in this type of data, and this has implicated many new genes in tumor development. However, our knowledge remains more limited when it comes to the contribution of SVs to cancer. In the present thesis, we made use of publicly available cancer genomics data to gain further insight into the role of structural genomic alterations in tumor development. Viruses cause 10-15% of all human cancers through multiple mechanisms, one of which is structural genomic changes due to viral DNA being integrated into the human genome. Thus, in the first study, we performed an unbiased screen for viral genomic integrations into cancer genomes. We developed a computational pipeline using RNA-Seq data from ~4500 tumors across 19 different cancer types to detect viral integrations. We found that recurrent events typically involved known cancer genes, and were associated with altered gene expression. SVs can lead to copy number amplification of specific cancer driver genes, as well as the formation of fusion oncogenes, but their importance in cancer beyond these types of events is underexplored. We mapped SVs to the human genome using whole genome sequencing data from 600 tumors across 18 different cancer types and investigated the global relationship between SVs and mRNA changes. We found that such events often contribute to altered gene expression in human tumors, but we were not able to detect novel recurrent driver events. To increase the cohort size, we used a larger but lower resolution and more limited dataset, comprising of microarray based DNA copy number profiles from ~10,000 tumors across 32 cancer types, with the aim of identifying recurrent SV driver events in tumors. Specifically, we investigated SVs predicted to result in promoter substitution events, a known mechanism for gene activation in cancer, and found several recurrent activating events with potential cancer driver roles. Notable among our findings in all the studies were human papillomavirus integrations in RAD51B and ERBB2 and gene fusions involving NFE2L2, TIAM2 and SCARB1, all being known cancer genes. Taken together, massive amounts of genomic and transcriptomic sequencing data allowed us to comprehensively map viral integrations and structural variations in cancer, which led to the identification of several genes with potential roles in tumor development

    Limited evidence for evolutionarily conserved targeting of long non-coding RNAs by microRNAs

    Get PDF
    BACKGROUND: Long non-coding RNAs (lncRNAs) are emerging as important regulators of cell physiology, but it is yet unknown to what extent lncRNAs have evolved to be targeted by microRNAs. Comparative genomics has previously revealed widespread evolutionarily conserved microRNA targeting of protein-coding mRNAs, and here we applied a similar approach to lncRNAs. FINDINGS: We used a map of putative microRNA target sites in lncRNAs where site conservation was evaluated based on 46 vertebrate species. We compared observed target site frequencies to those obtained with a random model, at variable prediction stringencies. While conserved sites were not present above random expectation in intergenic lncRNAs overall, we observed a marginal over-representation of highly conserved 8-mer sites in a small subset of cytoplasmic lncRNAs (12 sites in 8 lncRNAs at 56% false discovery rate, P = 0.10). CONCLUSIONS: Evolutionary conservation in lncRNAs is generally low but patch-wise high, and these patches could, in principle, harbor conserved target sites. However, while our analysis efficiently detected conserved targeting of mRNAs, it provided only limited and marginally significant support for conserved microRNA-lncRNA interactions. We conclude that conserved microRNA-lncRNA interactions could not be reliably detected with our methodology

    Simultaneous DNA and RNA Mapping of Somatic Mitochondrial Mutations across Diverse Human Cancers

    Get PDF
    <div><p>Somatic mutations in the nuclear genome are required for tumor formation, but the functional consequences of somatic mitochondrial DNA (mtDNA) mutations are less understood. Here we identify somatic mtDNA mutations across 527 tumors and 14 cancer types, using an approach that takes advantage of evidence from both genomic and transcriptomic sequencing. We find that there is selective pressure against deleterious coding mutations, supporting that functional mitochondria are required in tumor cells, and also observe a strong mutational strand bias, compatible with endogenous replication-coupled errors as the major source of mutations. Interestingly, while allelic ratios in general were consistent in RNA compared to DNA, some mutations in tRNAs displayed strong allelic imbalances caused by accumulation of unprocessed tRNA precursors. The effect was explained by altered secondary structure, demonstrating that correct tRNA folding is a major determinant for processing of polycistronic mitochondrial transcripts. Additionally, the data suggest that tRNA clusters are preferably processed in the 3′ to 5′ direction. Our study gives insights into mtDNA function in cancer and answers questions regarding mitochondrial tRNA biogenesis that are difficult to address in controlled experimental systems.</p></div

    Frameshift indels show reduced heteroplasmy levels indicative of negative selection.

    No full text
    <p>Cumulative distributions of mutant allele frequencies (heteroplasmy levels) for different mutational categories. Frameshift indels showed significantly reduced levels of heteroplasmy, and never reach above 85%. A similar trend, although non-significant, was seen for missense (stop-inducing) mutations. In contrast, D-loop mutations, which in general should be more tolerable, showed significantly elevated levels. <i>P</i>-values were calculated using the two-sample Kolmogorov-Smirnov test, comparing the tumor set of interest to remaining samples. Missense PP2 refers to a subset of missense mutations predicted to be “probably damaging” by PolyPhen-2 [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005333#pgen.1005333.ref029" target="_blank">29</a>].</p

    Proposed model of mt-tRNA processing in light of observed RNA species in human cancers.

    No full text
    <p>The normal processing cascade is depicted (left-hand side). The data in <b><a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005333#pgen.1005333.g004" target="_blank">Fig 4</a></b> suggests that proper folding of the pre-tRNAs is important for RNAse P/Z processing. Structure-disrupting mutations in <i>tRNA</i><sup><i>Ile</i></sup> allow for normal processing of the <i>tRNA</i><sup><i>Met</i></sup>, but leave polyA+ processing-intermediate products with mutation-bearing <i>tRNA</i><sup><i>Ile</i></sup> and the antisense-<i>tRNA</i><sup><i>Gln</i></sup> sequences on the 3’ end of <i>ND1</i> (middle). This implies that the antisense <i>tRNA</i><sup><i>Gln</i></sup> is not a substrate for tRNA processing endonucleases. Structure-disrupting mutations in the <i>tRNA</i><sup><i>Met</i></sup> gene lead to the accumulation of intermediates in which the antisense <i>tRNA</i><sup><i>Gln</i></sup> and mutation-bearing tRNA<sup><i>Met</i></sup> sequences remain attached to the 5’ end of the <i>ND2</i> gene (right-hand side). Processing of the wild type <i>tRNA</i><sup><i>Ile</i></sup> still occurs but at reduced efficacy, consistent with a model whereby processing of a multi-tRNA cluster occurs preferably in the 3’ to 5’ direction.</p

    Comparison of allelic ratios in DNA and RNA reveals allelic imbalances consistent with impaired tRNA processing.

    No full text
    <p>(<b>a</b>) Scatter plot of allele frequencies (heteroplasmy levels) in DNA vs. RNA for all 616 mutations (<i>r</i> = 0.91). 15 mutations with marked accumulation in polyA+ RNA relative to DNA (frequency difference > 0.3) are indicated in red. 12 of these 15 mutations were in tRNAs regions (numbered 1–12 in superscript), indicative of impaired processing of the polyA+ precursor RNA to a mature polyA- tRNA. (<b>b</b>) tRNA mutations accumulated in polyA+ RNA (red in panel a) showed elevated predicted RNA structural impact, determined using the RNAsnp tool [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005333#pgen.1005333.ref035" target="_blank">35</a>,<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005333#pgen.1005333.ref036" target="_blank">36</a>], compared to other tRNA mutations (<i>P</i> = 0.038, Wilcoxon rank sum test). The comparison was based on 9 and 9 inhibiting/non-inhibiting mutations (cases where the wild-type sequence failed to fold into a tRNA-like structure were excluded). The dotted line indicates an RNAsnp <i>P</i>-value (structural impact score) of 0.2 (<b>c</b>) Example RNAsnp result for a U to C mutation in position 37 of the mitochondrial isoleucine tRNA. The dot-plot shows the ensemble base-pair probabilities of the wild type (upper triangle) and mutant (lower triangle) sequences, with the altered local region indicated in gray. Wild type and mutant minimum free energy structures are shown (altered local region in color). (<b>d</b>) Normalized RNA read coverage, showing relative (per-tumor normalized) polyA+ expression levels across the mitochondrial genome in mutated tRNA regions for the 12 tRNA mutations indicated in panel a (each identifiable by a superscript number). Mutated cases (yellow) are compared to controls (green, median of all non-mutated cases). Gene strand orientation is indicated by arrows (right-facing: L-strand). Mutated positions are indicated by triangles. Samples IDs are shown in gray. tRNA genes are referred to as “TX”, where X = the single letter amino acid code.</p

    Overview of mtDNA mutational signatures.

    No full text
    <p>Substitution patterns (mutational signatures) are shown for each cancer type, which each substitution class labeled by the pyrimidine of the Watson-Crick pair [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005333#pgen.1005333.ref031" target="_blank">31</a>] but with sense and antisense patterns shown separately to reveal strand biases. Bars indicate enrichment relative to the expected frequency (observed/expected ratio) for all possible substitutions, taking into account the nucleotide composition of mtDNA and assuming equal probability for all three substitutions. L, light strand; H, heavy strand.</p

    Mutational density across the mitochondrial genome.

    No full text
    <p>Inward-facing thick bars indicate the number of mutations per 331 nt segment (50 segments), with substitutions and indels shown in gray and orange, respectively. 616 somatic mutations, identified across 527 tumors, are shown. Outward-facing bars thin bars indicate individual recurrently mutated positions (> = 2 tumors).</p
    corecore