18 research outputs found
Massive NGS data analysis reveals hundreds of potential novel gene fusions in human cell lines
Background:
Gene fusions derive from chromosomal rearrangements and the resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. So far, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive Next Generation Sequencing dataset for all the existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events.
Results:
In our work, we have extensively reanalyzed 935 paired-end RNA-seq experiments downloaded from "The Cancer Cell Line Encyclopedia" repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four different gene fusion detection algorithms. The results have been further prioritized by running a bayesian classifier which makes an in silico validation. The collection of fusion events supported by all of the predictive softwares results in a robust set of ∼ 1,700 in-silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamical and interactive web portal, further integrated with validated data from other well known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest.
Conclusions:
We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines, but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets
TumorFusions: an integrative resource for cancer-associated transcript fusions.
Gene fusion represents a class of molecular aberrations in cancer and has been exploited for therapeutic purposes. In this paper we describe TumorFusions, a data portal that catalogues 20 731 gene fusions detected in 9966 well characterized cancer samples and 648 normal specimens from The Cancer Genome Atlas (TCGA). The portal spans 33 cancer types in TCGA. Fusion transcripts were identified via a uniform pipeline, including filtering against a list of 3838 transcript fusions detected in a panel of 648 non-neoplastic samples. Fusions were mapped to somatic DNA rearrangements identified using whole genome sequencing data from 561 cancer samples as a means of validation. We observed that 65% of transcript fusions were associated with a chromosomal alteration, which is annotated in the portal. Other features of the portal include links to SNP array-based copy number levels and mutational patterns, exon and transcript level expressions of the partner genes, and a network-based centrality score for prioritizing functional fusions. Our portal aims to be a broadly applicable and user friendly resource for cancer gene annotation and is publicly available at http://www.tumorfusions.org. Nucleic Acids Res 2018 Jan 4; 46(D1):D1144-D1149
Driver Fusions and Their Implications in the Development and Treatment of Human Cancers.
Gene fusions represent an important class of somatic alterations in cancer. We systematically investigated fusions in 9,624 tumors across 33 cancer types using multiple fusion calling tools. We identified a total of 25,664 fusions, with a 63% validation rate. Integration of gene expression, copy number, and fusion annotation data revealed that fusions involving oncogenes tend to exhibit increased expression, whereas fusions involving tumor suppressors have the opposite effect. For fusions involving kinases, we found 1,275 with an intact kinase domain, the proportion of which varied significantly across cancer types. Our study suggests that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of them. Finally, we identified druggable fusions involving genes such as TMPRSS2, RET, FGFR3, ALK, and ESR1 in 6.0% of cases, and we predicted immunogenic peptides, suggesting that fusions may provide leads for targeted drug and immune therapy
Recommended from our members
Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas.
DNA damage repair (DDR) pathways modulate cancer risk, progression, and therapeutic response. We systematically analyzed somatic alterations to provide a comprehensive view of DDR deficiency across 33 cancer types. Mutations with accompanying loss of heterozygosity were observed in over 1/3 of DDR genes, including TP53 and BRCA1/2. Other prevalent alterations included epigenetic silencing of the direct repair genes EXO5, MGMT, and ALKBH3 in ∼20% of samples. Homologous recombination deficiency (HRD) was present at varying frequency in many cancer types, most notably ovarian cancer. However, in contrast to ovarian cancer, HRD was associated with worse outcomes in several other cancers. Protein structure-based analyses allowed us to predict functional consequences of rare, recurrent DDR mutations. A new machine-learning-based classifier developed from gene expression data allowed us to identify alterations that phenocopy deleterious TP53 mutations. These frequent DDR gene alterations in many human cancers have functional consequences that may determine cancer progression and guide therapy
Genomic basis for RNA alterations in cancer
Transcript alterations often result from somatic changes in cancer genomes. Various forms of RNA alterations have been described in cancer, including overexpression, altered splicing and gene fusions; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer
Recommended from our members
The role of 3D human genome architecture in mutability - from predicting penetrance/gene fusions to discovering novel schizophrenia-associated variants
We have become very familiar with the genome being represented as a one dimensional sequence of the four nucleobases – cytosine, guanine, adenine and thymine. However, in reality this chain folds and is densely packed into the nucleus of eukaryotic cells in a three-dimensional (3D) setting, meaning that pairs of otherwise remote areas of the genome can come into close proximity in 3D space. It is thought that the expression of target genes is influenced by remotely acting regulatory elements, such as enhancers, which are often located several kilobases away from the genes they target.
In our studies we hypothesised that communication between widely spaced genomic elements is facilitated by the spatial organisation of chromosomes that bring genes and their regulatory elements in close spatial proximity. We explored this hypothesis in three distinct contexts:(1) reduced/incomplete penetrance, where disease genotypes do not always induce the expected phenotype; (2) gene fusion events, known to be frequent in cancer; (3) schizophrenia, a complex brain disorder. Whilst previous studies acknowledged the role of polygenic activity in these genetic diseases and phenomena, they did not integrate this idea into existing detection/prediction techniques. Our analysis addressed this oversight by transforming traditionally one-dimensional studies into a contextually relevant, 3D setting.
We utilised data describing the 3D structure of the human genome, alongside prior knowledge of various diseases and genetic phenomena, to predict novel genomic regions of association. Our approaches incorporated network, statistical and computational methods to identify where these regions of interest lie. Identified regions were investigated further to ascertain biological properties, such as an enriched presence of mutations, functionally relevant genes, regulatory elements, or all of the above. Whilst existing approaches tend to fixate on only these static properties, our studies also focused on the communication of otherwise remote regions by creating 3D interaction networks that describe the spatial proximities of genomic fragments. The most important units of such networks were identified via centrality measures and statistical testing, followed by subsequent biological interrogation of so-called candidate regions. This method ultimately confirmed whether regions were genuinely disease-associated via polygenic activity, or not.
A total of 35 novel schizophrenia candidate regions were identified using our approach, 22 of which contained polymorphisms with prior schizophrenia association; most variants found were shown to influence gene expression specifically in brain tissues. We were also successful in showing that cancer-causing gene fusion events are catalysed by paired fusion gene-containing fragments (of lengths 1 megabase and 100 kilobases) sharing small 3D neighbourhoods, particularly for genes residing on different chromosomes. Our transformation of existing approaches into 3D studies has therefore elucidated features and properties of genetic disease and cancer that were otherwise unknown or overlooked