45 research outputs found

    Syöpägenetiikan tutkimus uusien sekvensointimenetelmien aikakaudella

    Get PDF
    The research in cancer genetics aims to detect genetic causes for the excessive growth of cells, which may subsequently form a tumor and further develop into cancer. The Human Genome Project succeeded in mapping the majority of the human DNA sequence, which enabled modern sequencing technologies to emerge, namely next-generation sequencing (NGS). The new era of disease genetics research shifted DNA analyses from laboratory to computer screens. Since then, the massive growth of sequencing data has been facilitating the detection of novel disease-causing mutations and thus improving the screening and medical treatments of cancer. However, the exponential growth of sequencing data brought new challenges for computing. The sheer size of the data is not only expensive to store and maintain, but also highly demanding to process and analyze. Moreover, not only has the amount of sequencing data increased, but new kinds of functional genomics data, which are instrumental in figuring out the consequences of detected mutations, have also emerged. To this end, continuous software development has become essential to enable the utilization of all produced research data, new and old. This thesis describes a software for the analysis and visualization of NGS data (publication I) that allows the integration of genomic data from various sources. The software, BasePlayer, was designed for the need of efficient and user-friendly methods that could be used to analyze and visualize massive variant, and various other types of genomic data. To this end, we developed a multi-purpose tool for the analysis of genomic data, such as DNA, RNA, ChIP-seq, and DNase. The capabilities of BasePlayer in the detection of putatively causative variants and data visualization have already been used in over twenty scientific publications. The applicability of the software is demonstrated in this thesis with two distinct analysis cases - publications II and III. The second study considered somatic mutations in colorectal cancer (CRC) genomes. We were able to identify distinct mutation patterns at the CTCF/Cohesin binding sites (CBSs) by analyzing whole-genome sequencing (WGS) data with BasePlayer. The sites were observed to be frequently mutated in CRC, especially in samples with a specific mutational signature. However, the source for the mutation accumulation remained unclear. On the contrary, a subset of samples with an ultra-mutator phenotype, caused by defective polymerase epsilon (POLE) gene, exhibited an inverse pattern at CBSs. We detected the same signal in other, predominantly gastrointestinal, cancers as well. However, we were not able to measure changes in gene expressions at mutated sites, so the role of the CBS mutations in tumorigenesis remained and still remains to be elucidated. The third study considered esophageal squamous cell carcinoma (ESCC), and the objective was to detect predisposing mutations using the Finnish Cancer Registry (FCR) data. We performed clustering analysis for the FCR data, with additional information obtained from the Population Information System of Finland. We detected an enrichment of ESCC in the Karelia region and were able to collect and sequence 30 formalin-fixed paraffin-embedded (FFPE) samples from the region. We reported several candidate genes, out of which EP300 and DNAH9 were considered the most interesting. The study not only reported putative genes predisposing to ESCC but also worked as a proof of concept for the feasibility of conducting genetic research utilizing both clustering of the FCR data and FFPE exome sequencing in such studies.Syöpägenetiikan tutkimuksen tavoitteena on löytää perimmäisiä syitä solujen liikakasvulle, joka voi johtaa kasvaimen muodostumiseen ja kehittyä edelleen syöväksi. Laajamittainen Human Genome Project, jonka tavoitteena oli selvittää ihmisen koko DNA sekvenssi (genomi) saatiin suurelta osin päätökseen vuosituhannen alussa. Kokonaisten genomien määrittäminen mahdollisti toisen sukupolven sekvensointimenetelmien (next-generation sequencing, NGS) kehityksen ja käyttöönoton. Tämä aloitti uuden aikakauden erityisesti tautigenetiikassa ja siirsi analyysit laboratorioista tietokoneiden ruuduille. NGS menetelmien tuottamat valtavat datamäärät vauhdittivat uusien geneettisten löydösten tekemistä, mutta toivat myös uusia haasteita erityisesti biologiseen tietojenkäsittelyyn - bioinformatiikkaan. Datamäärien lisäksi myös erilaisten datatyyppien määrä kasvoi ja kasvaa edelleen; kaiken tuotetun datan prosessointi analysoitavaan muotoon vaatii erittäin tehokkaita tietokoneita ja algoritmeja. Lisäksi monen eri näytteen ja datatyypin yhdistäminen (integrointi) järkeväksi kokonaisuudeksi vaatii analyysiohjelmistoilta joustavuutta ja tehokkuutta erityisesti säätelyalueisiin liittyvissä tutkimuksissa. Bioinformaattisten ohjelmistojen jatkuva kehitys on täten ensiarvoisen tärkeää, jotta kaikki tuotettu data saadaan mahdollisimman hyvin tutkijoille hyödynnettäväksi. Tässä väitöskirjassa esitellään ohjelmisto, BasePlayer, joka on kehitetty laajoihin sekvenssidata-analyyseihin ja visualisointiin (julkaisu I). BasePlayer yhdistää graafisessa käyttöliittymässä geneettiseen analyysiin tarvittavat ominaisuudet, dataintegraation sekä visualisaation. Ohjelmisto mahdollistaa esimerkiksi satojen kasvainnäytteiden samanaikaisen tarkastelun, jonka avulla voi tunnistaa altistavia tai syöpää ajavia mutaatioita geenien säätelyalueilla. BasePlayeria on käytetty jo yli kahdessakymmenessä tieteellisessä julkaisussa, joista kaksi on tämän väitöskirjan osatöinä (julkaisut II ja III). Toisessa julkaisussa etsittiin BasePlayeria hyödyntäen syöpää ajavia mutaatioita geenien säätelyalueilta käyttäen yli kahtasataa kolorektaalisyöpänäytettä. Koko-genomin kattavalla sekvensointiaineistolla havaitsimme, että osassa näytteitä mutaatioita on kertynyt runsaasti erityisesti kohesiinin sitoutumiskohtiin. Kohesiini on mukana useissa tärkeissä tehtävissä mm. DNA:n rakenteeseen ja geenien säätelyyn liittyen. Havaitsimme myös mutaatioiden vähenemän samoilla alueilla näytteissä, jotka olivat ultra-mutatoituneita (satakertainen mutaatiomäärä keskimääräisiin kolorektaalikasvaimiin verrattuna). Mutaatioiden kertymä havaittiin myös muissa, erityisesti ruoansulatuskanavan syövissä. Havaitun ilmiön rooli kasvainten kehittymisessä jäi tosin vielä selvittämättä. Kolmannessa työssä etsittiin ruokatorven syöpään altistavia geenimutaatioita. Haimme Suomen syöpärekisteriä ja väestötietojärjestelmää apuna käyttäen alueita, joissa ruokatorven syöpää esiintyi sukunimeen perustuen merkittävästi keskimääräistä enemmän. Merkittävästi rikastunut alue löytyi luovutetun Karjalan alueelta, josta saimme kerättyä ja sekvensoitua 30 arkistoitua kudosnäytettä. Hyödynsimme BasePlayerin näytevertailu-ominaisuuksia, joiden avulla havaitsimme potilaissa rikastuneet variantit normaaliväestöön verrattuna. Kiinnostavimmat tulokset liittyivät harvinaisiin variantteihin EP300 ja DNAH9 geeneissä. Mahdollisten uusien alttiusgeenien raportoinnin lisäksi tämä työ osoitti, että syöpärekisteriä hyödyntämällä voidaan löytää kuvatun kaltaisia tauti-tihentymiä ja myös sen, että arkistoitu kudosmateriaali on käyttökelpoista tämänkaltaisissa sekvensointiin pohjautuvissa tutkimuksissa

    Germline mutations in young non-smoking women with lung adenocarcinoma

    Get PDF
    Objectives: Although the primary cause of lung cancer is smoking, a considerable proportion of all lung cancers occur in never smokers. Gender influences the risk and characteristics of lung cancer and women are over-represented among never smokers with the disease. Young age at onset and lack of established environmental risk factors suggest genetic predisposition. In this study, we used population-based sampling of young patients to discover candidate predisposition variants for lung adenocarcinoma in never-smoking women. Materials and methods: We employed archival normal tissue material from 21 never-smoker women who had been diagnosed with lung adenocarcinoma before the age of 45, and exome sequenced their germline DNA. Results and conclusion: Potentially pathogenic variants were found in eight Cancer Gene Census germline genes: BRCAI, BRCA2, ERCC4, EXT1, HNF1 A, PTCH1, SMARCB1 and TP53. The variants in TP53, BRCAI, and BRCA2 are likely to have contributed to the early onset lung cancer in the respective patients (3/21 or 14%). This supports the notion that lung adenocarcinoma can be a component of certain cancer predisposition syndromes. Fifteen genes displayed potentially pathogenic mutations in at least two patients: ABCC10, ATP7B, CACNA1S, CFTR, CLIP4, COL6A1, COL6A6, GCN1, GJB6, RYR1, SCN7A, SEC24A, SP100, TEN and USH2A. Four patients showed a mutation in COL6A1, three in CLIP4 and two in the rest of the genes. Some of these candidate genes may explain a subset of female lung adenocarcinoma.Peer reviewe

    Mutational processes of distinct POLE exonuclease domain mutants drive an enrichment of a specific TP53 mutation in colorectal cancer

    Get PDF
    Author summary Cancer arises through the accumulation of somatic mutations. The way that these somatic mutations form can vary greatly in different cancers. One of the most mutagenic processes that have been identified is caused by mutations within a replicative DNA polymerase known as Polymerase Epsilon (POLE). Cancers with such mutations present with hundreds of thousands of somatic mutations in their genome. Previous cancer genomics studies have identified a number of mutation hotspots in POLE, however how these different POLE mutants behave in affecting mutation distribution has not been studied. Here, we describe the genome-wide mutation profiles of distinct POLE mutant cancers. We find that different mutants indeed result in different mutation profiles and that this can be explained by the different fidelities of these mutants in replicating specific DNA sequences. Significantly, these differences have important implications in cancer formation as we found that a POLE mutation is strongly associated with a specific truncation of the TP53 cancer driver gene. This study furthers our understanding of the POLE mutagenic process in cancer and provide important insights into carcinogenesis in cancers with such mutations.Peer reviewe

    Novel germline variant in the histone demethylase and transcription regulator KDM4C induces a multi-cancer phenotype

    Get PDF
    Background Genes involved in epigenetic regulation are central for chromatin structure and gene expression. Specific mutations in these might promote carcinogenesis in several tissue types. Methods We used exome, whole-genome and Sanger sequencing to detect rare variants shared by seven affected individuals in a striking early-onset multi-cancer family. The only variant that segregated with malignancy resided in a histone demethylase KDM4C. Consequently, we went on to study the epigenetic landscape of the mutation carriers with ATAC, ChIP (chromatin immunoprecipitation) and RNA-sequencing from lymphoblastoid cell lines to identify possible pathogenic effects. Results A novel variant in KDM4C, encoding a H3K9me3 histone demethylase and transcription regulator, was found to segregate with malignancy in the family. Based on Roadmap Epigenomics Project data, differentially accessible chromatin regions between the variant carriers and controls enrich to normally H3K9me3-marked chromatin. We could not detect a difference in global H3K9 trimethylation levels. However, carriers of the variant seemed to have more trimethylated H3K9 at transcription start sites. Pathway analyses of ChIP-seq and differential gene expression data suggested that genes regulated through KDM4C interaction partner EZH2 and its interaction partner PLZF are aberrantly expressed in mutation carriers. Conclusions The apparent dysregulation of H3K9 trimethylation and KDM4C-associated genes in lymphoblastoid cells supports the hypothesis that the KDM4C variant is causative of the multi-cancer susceptibility in the family. As the variant is ultrarare, located in the conserved catalytic JmjC domain and predicted pathogenic by the majority of available in silico tools, further studies on the role of KDM4C in cancer predisposition are warranted.Peer reviewe

    Exome sequencing reveals candidate mutations implicated in sinonasal carcinoma and malignant transformation of sinonasal inverted papilloma

    Get PDF
    We explored somatic mutations in dysplastic sinonasal inverted papilloma (SNIP), SNIP with concomitant sinonasal squamous cell carcinoma (SNSCC), and SNSCC without preceding SNIP. Ten SNIP and SNSCC samples were analyzed with exome sequencing and tested for human papillomavirus. The identified mutations were compared to the most frequently mutated genes in head and neck squamous cell carcinoma (HNSCC) in the COSMIC database. Exome sequencing data were also analyzed for mutations not previously linked to SNSCC. Seven of the most commonly mutated genes in HNSCC and SNSCC in COSMIC harbored mutations in our data. In addition, we identified mutations in 23 genes that are likely to contribute to SNIP and SNSCC oncogenesis.Peer reviewe

    Detection of subclonal L1 transductions in colorectal cancer by long-distance inverse-PCR and Nanopore sequencing

    Get PDF
    Long interspersed nuclear elements-1 (L1s) are a large family of retrotransposons. Retrotransposons are repetitive sequences that are capable of autonomous mobility via a copy-and-paste mechanism. In most copy events, only the L1 sequence is inserted, however, they can also mobilize the flanking non-repetitive region by a process known as 3' transduction. L1 insertions can contribute to genome plasticity and cause potentially tumorigenic genomic instability. However, detecting the activity of a particular source L1 and identifying new insertions stemming from it is a challenging task with current methodological approaches. We developed a long-distance inverse PCR (LDI-PCR) based approach to monitor the mobility of active L1 elements based on their 3' transduction activity. LDI-PCR requires no prior knowledge of the insertion target region. By applying LDI-PCR in conjunction with Nanopore sequencing (Oxford Nanopore Technologies) on one L1 reported to be particularly active in human cancer genomes, we detected 14 out of 15 3' transductions previously identified by whole genome sequencing in two different colorectal tumour samples. In addition we discovered 25 novel highly subclonal insertions. Furthermore, the long sequencing reads produced by LDI-PCR/Nanopore sequencing enabled the identification of both the 5' and 3' junctions and revealed detailed insertion sequence information.Peer reviewe

    Next-generation sequencing in a large pedigree segregating visceral artery aneurysms suggests potential role of COL4A1/COL4A2 in disease etiology

    Get PDF
    Background Visceral artery aneurysms (VAAs) can be fatal if ruptured. Although a relatively rare incident, it holds a contemporary mortality rate of approximately 12%. VAAs have multiple possible causes, one of which is genetic predisposition. Here, we present a striking family with seven individuals affected by VAAs, and one individual affected by a visceral artery pseudoaneurysm. Methods We exome sequenced the affected family members and the parents of the proband to find a possible underlying genetic defect. As exome sequencing did not reveal any feasible protein-coding variants, we combined whole-genome sequencing of two individuals with linkage analysis to find a plausible non-coding culprit variant. Variants were ranked by the deep learning framework DeepSEA. Results Two of seven top-ranking variants, NC_000013.11:g.108154659C>T and NC_000013.11:g.110409638C>T, were found in all VAA-affected individuals, but not in the individual affected by the pseudoaneurysm. The second variant is in a candidate cis-regulatory element in the fourth intron of COL4A2, proximal to COL4A1. Conclusions As type IV collagens are essential for the stability and integrity of the vascular basement membrane and involved in vascular disease, we conclude that COL4A1 and COL4A2 are strong candidates for VAA susceptibility genes.Peer reviewe

    Novel germline variant in the histone demethylase and transcription regulator KDM4C induces a multi-cancer phenotype

    Get PDF
    Background Genes involved in epigenetic regulation are central for chromatin structure and gene expression. Specific mutations in these might promote carcinogenesis in several tissue types.Methods We used exome, whole-genome and Sanger sequencing to detect rare variants shared by seven affected individuals in a striking early-onset multi-cancer family. The only variant that segregated with malignancy resided in a histone demethylase KDM4C. Consequently, we went on to study the epigenetic landscape of the mutation carriers with ATAC, ChIP (chromatin immunoprecipitation) and RNA-sequencing from lymphoblastoid cell lines to identify possible pathogenic effects.Results A novel variant in KDM4C, encoding a H3K9me3 histone demethylase and transcription regulator, was found to segregate with malignancy in the family. Based on Roadmap Epigenomics Project data, differentially accessible chromatin regions between the variant carriers and controls enrich to normally H3K9me3-marked chromatin. We could not detect a difference in global H3K9 trimethylation levels. However, carriers of the variant seemed to have more trimethylated H3K9 at transcription start sites. Pathway analyses of ChIP-seq and differential gene expression data suggested that genes regulated through KDM4C interaction partner EZH2 and its interaction partner PLZF are aberrantly expressed in mutation carriers.Conclusions The apparent dysregulation of H3K9 trimethylation and KDM4C-associated genes in lymphoblastoid cells supports the hypothesis that the KDM4C variant is causative of the multi-cancer susceptibility in the family. As the variant is ultrarare, located in the conserved catalytic JmjC domain and predicted pathogenic by the majority of available in silico tools, further studies on the role of KDM4C in cancer predisposition are warranted.</p

    Colibactin DNA-damage signature indicates mutational impact in colorectal cancer

    Get PDF
    The mucosal epithelium is a common target of damage by chronic bacterial infections and the accompanying toxins, and most cancers originate from this tissue. We investigated whether colibactin, a potent genotoxin(1) associated with certain strains of Escherichia coli(2), creates a specific DNA-damage signature in infected human colorectal cells. Notably, the genomic contexts of colibactin-induced DNA double-strand breaks were enriched for an AT-rich hexameric sequence motif, associated with distinct DNA-shape characteristics. A survey of somatic mutations at colibactin target sites of several thousand cancer genomes revealed notable enrichment of this motif in colorectal cancers. Moreover, the exact double-strand-break loci corresponded with mutational hot spots in cancer genomes, reminiscent of a trinucleotide signature previously identified in healthy colorectal epithelial cells(3). The present study provides evidence for the etiological role of colibactin in human cancer. Identification of a DNA-damage signature induced by colibactin, a toxin expressed by some strains of Escherichia coli, is enriched in human colorectal cancers.Peer reviewe

    No evidence of EMAST in whole genome sequencing data from 248 colorectal cancers

    Get PDF
    Microsatellite instability (MSI) is caused by defective DNA mismatch repair (MMR), and manifests as accumulation of small insertions and deletions (indels) in short tandem repeats of the genome. Another form of repeat instability, elevated microsatellite alterations at selected tetranucleotide repeats (EMAST), has been suggested to occur in 50% to 60% of colorectal cancer (CRC), of which approximately one quarter are accounted for by MSI. Unlike for MSI, the criteria for defining EMAST is not consensual. EMAST CRCs have been suggested to form a distinct subset of CRCs that has been linked to a higher tumor stage, chronic inflammation, and poor prognosis. EMAST CRCs not exhibiting MSI have been proposed to show instability of di- and trinucleotide repeats in addition to tetranucleotide repeats, but lack instability of mononucleotide repeats. However, previous studies on EMAST have been based on targeted analysis of small sets of marker repeats, often in relatively few samples. To gain insight into tetranucleotide instability on a genome-wide level, we utilized whole genome sequencing data from 227 microsatellite stable (MSS) CRCs, 18 MSI CRCs, 3 POLE-mutated CRCs, and their corresponding normal samples. As expected, we observed tetranucleotide instability in all MSI CRCs, accompanied by instability of mono-, di-, and trinucleotide repeats. Among MSS CRCs, some tumors displayed more microsatellite mutations than others as a continuum, and no distinct subset of tumors with the previously proposed molecular characters of EMAST could be observed. Our results suggest that tetranucleotide repeat mutations in non-MSI CRCs represent stochastic mutation events rather than define a distinct CRC subclass.Peer reviewe
    corecore