8 research outputs found

    Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis.

    Get PDF
    Multiple signatures of somatic mutations have been identified in cancer genomes. Exome sequences of 1,001 human cancer cell lines and 577 xenografts revealed most common mutational signatures, indicating past activity of the underlying processes, usually in appropriate cancer types. To investigate ongoing patterns of mutational-signature generation, cell lines were cultured for extended periods and subsequently DNA sequenced. Signatures of discontinued exposures, including tobacco smoke and ultraviolet light, were not generated in vitro. Signatures of normal and defective DNA repair and replication continued to be generated at roughly stable mutation rates. Signatures of APOBEC cytidine deaminase DNA-editing exhibited substantial fluctuations in mutation rate over time with episodic bursts of mutations. The initiating factors for the bursts are unclear, although retrotransposon mobilization may contribute. The examined cell lines constitute a resource of live experimental models of mutational processes, which potentially retain patterns of activity and regulation operative in primary human cancers.This work was supported by Wellcome grants 098051 and 206194; Cancer Research UK Grand Challenge Award C98/A24032 to L.B.A. and B.O.; the Li Ka Shing Foundation and National Institute for Health Research Oxford Biomedical Research Centre to D.C.W.; ED481A-2016/151 from Xunta de Galicia to B.R.–M

    Relative consistency of projective reconstructions obtained from an amage pair.

    No full text

    Genomik lokasyonların fonksiyonel ilgililiklerinin değerlendirilmesi için araçlar ve teknikler.

    No full text
    Genomic studies identify genomic loci representing genetic variations, transcription factor occupancy, or histone modification through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and epigenomic annotations. In this thesis, we develop tools and techniques to assess the functional relevance of set of genomic intervals. Towards this goal, we first introduce Genomic Loci ANnotation and Enrichment Tool (GLANET) as a comprehensive annotation and enrichment analysis tool. Input query to GLANET is a set of genomic intervals. GLANET annotates and performs enrichment analysis on these loci with a rich library that includes: (i) gene-centric regions that encompass their non-coding neighborhood, (ii) a large collection of regulatory regions from ENCODE, and (iii) gene sets derived from pathways. As a key feature, users can easily extend this library with new gene sets and genomic intervals. GLANET implements a sampling-based enrichment test that can account for GC content and/or mappability biases inherent to NGS technologies, which shows high statistical power and well-controlled Type-I error rate. Other key features of GLANET include assessment of impact of single nucleotide variants on transcription factor binding sites when input consists of SNPs only and not only exon based but also regulation based gene set enrichment analysis by considering introns and proximal regions of genes in a gene set. GLANET also allows joint enrichment analysis for TF binding sites and KEGG pathways. With this option, users can evaluate whether the input set is enriched concurrently with binding sites of TFs and the genes within a KEGG pathway. This joint enrichment analysis provides a detailed functional interpretation of the input loci. As a second contribution we designed novel data-driven computational experiments for assessing the power and Type-I error of enrichment procedures. The data-driven computational experiments render detailed quantitative comparisons of GLANET with other tools possible. Our results on these computational experiments showcase GLANET’s unique capabilities as well as robustness, speed and accuracy. Finally, as a third contribution, we present an efficient algorithmic solution for finding common overlapping intervals over n interval sets. Our strategy is based on constructing one segment tree for each interval set as the first step and proceeds by converting each segment tree to an indexed segment tree forest by cutting this tree at a certain depth. Experiments on real data show that this data structure decreases the search time. This novel representation also enables parallel computations on each segment tree in the forest. We also extend this solution to solve the problem of finding at least k common overlapping intervals over n interval sets. The tools and techniques developed herein will hopefully expedite the genomic research and help improve our understanding of the molecular biology of the cell and the mechanisms underlying diseases. Ph.D. - Doctoral Progra

    JOA: Joint Overlap Analysis of multiple genomic interval sets

    Get PDF
    Abstract Background Next-generation sequencing (NGS) technologies have produced large volumes of genomic data. One common operation on heterogeneous genomic data is genomic interval intersection. Most of the existing tools impose restrictions such as not allowing nested intervals or requiring intervals to be sorted when finding overlaps in two or more interval sets. Results We proposed segment tree (ST) and indexed segment tree forest (ISTF) based solutions for intersection of multiple genomic interval sets in parallel. We developed these methods as a tool, Joint Overlap Analysis (JOA), which takes n interval sets and finds overlapping intervals with no constraints on the given intervals. The proposed indexed segment tree forest is a novel composite data structure, which leverages on indexing and natural binning of a segment tree. We also presented construction and search algorithms for this novel data structure. We compared JOA ST and JOA ISTF with each other, and with other interval intersection tools for verification of its correctness and for showing that it attains comparable execution times. Conclusions We implemented JOA in Java using the fork/join framework which speeds up parallel processing by taking advantage of all available processor cores. We compared JOA ST with JOA ISTF and showed that segment tree and indexed segment tree forest methods are comparable with each other in terms of execution time and memory usage. We also carried out execution time comparison analysis for JOA and other tools and demonstrated that JOA has comparable execution time and is able to further reduce its running time by using more processors per node. JOA can be run using its GUI or as a command line tool. JOA is available with source code at https://github.com/burcakotlu/JOA/. A user manual is provided at https://joa.readthedocs.or

    Functional enrichment analysis of deregulated long non-coding RNAs in cancer based on their genomic neighbors

    No full text
    The dysregulation of long non-coding RNAs’ (lncRNAs) expressions has been implicated in cancer. Since most of the lncRNAs’ are not functionally characterized well, investigating the set of perturbed lncRNAs are is challenging. Existing methods that inspect lncRNAs functionally rely on the coexpressed coding genes, which are far better characterized functionally. LncRNAs can be known to act as transcriptional regulators; they may activate or repress the neighborhood’s coding genes on the genome. Based on this, in this work, we aim to analyze the deregulated lncRNAs in cancer by taking into account their ability to regulate nearby loci on the genome. We perform functional analysis on differentially expressed lncRNAs for 28 different cancers considering their adjacent coding genes. We identify that some deregulated lncRNAs are cancer-specific, but a substantial number of lncRNAs are shared across cancers. Next, we assess the similarities of the cancer types based on the functional enrichment of the deregulated lncRNA sets. We find some cancers are very similar in the functions and biological processes related to the deregulated lncRNAs. We observe that some of the cancers for which we find similarity can be linked through primary, metastatic site relations. We investigate the similarity of enriched functional terms for the deregulated lncRNAs and the mRNAs. We further assess the enriched functions’ similarity to the functions and processes that the known cancer driver genes take place. We believe that our methodology help to understand the impact of the lncRNAs in cancer functionally

    Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor.

    No full text
    Mutational signature analysis is commonly performed in cancer genomic studies. Here, we present SigProfilerExtractor, an automated tool for de novo extraction of mutational signatures, and benchmark it against another 13 bioinformatics tools by using 34 scenarios encompassing 2,500 simulated signatures found in 60,000 synthetic genomes and 20,000 synthetic exomes. For simulations with 5% noise, reflecting high-quality datasets, SigProfilerExtractor outperforms other approaches by elucidating between 20% and 50% more true-positive signatures while yielding 5-fold less false-positive signatures. Applying SigProfilerExtractor to 4,643 whole-genome- and 19,184 whole-exome-sequenced cancers reveals four novel signatures. Two of the signatures are confirmed in independent cohorts, and one of these signatures is associated with tobacco smoking. In summary, this report provides a reference tool for analysis of mutational signatures, a comprehensive benchmarking of bioinformatics tools for extracting signatures, and several novel mutational signatures, including one putatively attributed to direct tobacco smoking mutagenesis in bladder tissues. </p
    corecore