105 research outputs found

    BlackOPs: Increasing confidence in variant detection through mappability filtering

    Get PDF
    Identifying variants using high-throughput sequen-cing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical arti-fact results from incorrectly aligning experimen-tally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We de-veloped BlackOPs, an open-source tool that simu-lates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklist

    BlackOPs: increasing confidence in variant detection through mappability filtering

    Get PDF
    Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing

    Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context

    Get PDF
    Long noncoding RNAs (lncRNAs) are commonly dys-regulated in tumors, but only a handful are known toplay pathophysiological roles in cancer. We inferredlncRNAs that dysregulate cancer pathways, onco-genes, and tumor suppressors (cancer genes) bymodeling their effects on the activity of transcriptionfactors, RNA-binding proteins, and microRNAs in5,185 TCGA tumors and 1,019 ENCODE assays.Our predictions included hundreds of candidateonco- and tumor-suppressor lncRNAs (cancerlncRNAs) whose somatic alterations account for thedysregulation of dozens of cancer genes and path-ways in each of 14 tumor contexts. To demonstrateproof of concept, we showed that perturbations tar-geting OIP5-AS1 (an inferred tumor suppressor) andTUG1 and WT1-AS (inferred onco-lncRNAs) dysre-gulated cancer genes and altered proliferation ofbreast and gynecologic cancer cells. Our analysis in-dicates that, although most lncRNAs are dysregu-lated in a tumor-specific manner, some, includingOIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergis-tically dysregulate cancer pathways in multiple tumorcontexts

    Differential Pathogenesis of Lung Adenocarcinoma Subtypes Involving Sequence Mutations, Copy Number, Chromosomal Instability, and Methylation

    Get PDF
    Lung adenocarcinoma (LAD) has extreme genetic variation among patients, which is currently not well understood, limiting progress in therapy development and research. LAD intrinsic molecular subtypes are a validated stratification of naturally-occurring gene expression patterns and encompass different functional pathways and patient outcomes. Patients may have incurred different mutations and alterations that led to the different subtypes. We hypothesized that the LAD molecular subtypes co-occur with distinct mutations and alterations in patient tumors.The LAD molecular subtypes (Bronchioid, Magnoid, and Squamoid) were tested for association with gene mutations and DNA copy number alterations using statistical methods and published cohorts (n = 504). A novel validation (n = 116) cohort was assayed and interrogated to confirm subtype-alteration associations. Gene mutation rates (EGFR, KRAS, STK11, TP53), chromosomal instability, regional copy number, and genomewide DNA methylation were significantly different among tumors of the molecular subtypes. Secondary analyses compared subtypes by integrated alterations and patient outcomes. Tumors having integrated alterations in the same gene associated with the subtypes, e.g. mutation, deletion and underexpression of STK11 with Magnoid, and mutation, amplification, and overexpression of EGFR with Bronchioid. The subtypes also associated with tumors having concurrent mutant genes, such as KRAS-STK11 with Magnoid. Patient overall survival, cisplatin plus vinorelbine therapy response and predicted gefitinib sensitivity were significantly different among the subtypes.The lung adenocarcinoma intrinsic molecular subtypes co-occur with grossly distinct genomic alterations and with patient therapy response. These results advance the understanding of lung adenocarcinoma etiology and nominate patient subgroups for future evaluation of treatment response

    Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas

    Get PDF
    Although theMYConcogene has been implicated incancer, a systematic assessment of alterations ofMYC, related transcription factors, and co-regulatoryproteins, forming the proximal MYC network (PMN),across human cancers is lacking. Using computa-tional approaches, we define genomic and proteo-mic features associated with MYC and the PMNacross the 33 cancers of The Cancer Genome Atlas.Pan-cancer, 28% of all samples had at least one ofthe MYC paralogs amplified. In contrast, the MYCantagonists MGA and MNT were the most frequentlymutated or deleted members, proposing a roleas tumor suppressors.MYCalterations were mutu-ally exclusive withPIK3CA,PTEN,APC,orBRAFalterations, suggesting that MYC is a distinct onco-genic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such asimmune response and growth factor signaling; chro-matin, translation, and DNA replication/repair wereconserved pan-cancer. This analysis reveals insightsinto MYC biology and is a reference for biomarkersand therapeutics for cancers with alterations ofMYC or the PMN

    Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas

    Get PDF
    This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing molecular features of squamous cell carcinomas (SCCs) from five sites associated with smokin

    Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

    Get PDF
    Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL maps are derived through computational staining using a convolutional neural network trained to classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and correlation with overall survival. TIL map structural patterns were grouped using standard histopathological parameters. These patterns are enriched in particular T cell subpopulations derived from molecular measures. TIL densities and spatial structure were differentially enriched among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for the TCGA image archives with insights into the tumor-immune microenvironment

    A Gene Catalogue of the Euchromatic Male-Specific Region of the Horse Y Chromosome: Comparison with Human and Other Mammals

    Get PDF
    Studies of the Y chromosome in primates, rodents and carnivores provide compelling evidence that the male specific region of Y (MSY) contains functional genes, many of which have specialized roles in spermatogenesis and male-fertility. Little similarity, however, has been found between the gene content and sequence of MSY in different species. This hinders the discovery of species-specific male fertility genes and limits our understanding about MSY evolution in mammals. Here, a detailed MSY gene catalogue was developed for the horse – an odd-toed ungulate. Using direct cDNA selection from horse testis, and sequence analysis of Y-specific BAC clones, 37 horse MSY genes/transcripts were identified. The genes were mapped to the MSY BAC contig map, characterized for copy number, analyzed for transcriptional profiles by RT-PCR, examined for the presence of ORFs, and compared to other mammalian orthologs. We demonstrate that the horse MSY harbors 20 X-degenerate genes with known orthologs in other eutherian species. The remaining 17 genes are acquired or novel and have so far been identified only in the horse or donkey Y chromosomes. Notably, 3 transcripts were found in the heterochromatic part of the Y. We show that despite substantial differences between the sequence, gene content and organization of horse and other mammalian Y chromosomes, the functions of MSY genes are predominantly related to testis and spermatogenesis. Altogether, 10 multicopy genes with testis-specific expression were identified in the horse MSY, and considered likely candidate genes for stallion fertility. The findings establish an important foundation for the study of Y-linked genetic factors governing fertility in stallions, and improve our knowledge about the evolutionary processes that have shaped Y chromosomes in different mammalian lineages

    Correction: Molecular Subtypes in Head and Neck Cancer Exhibit Distinct Patterns of Chromosomal Gain and Loss of Canonical Cancer Genes

    Get PDF
    Head and neck squamous cell carcinoma (HNSCC) is a frequently fatal heterogeneous disease. Beyond the role of human papilloma virus (HPV), no validated molecular characterization of the disease has been established. Using an integrated genomic analysis and validation methodology we confirm four molecular classes of HNSCC (basal, mesenchymal, atypical, and classical) consistent with signatures established for squamous carcinoma of the lung, including deregulation of the KEAP1/NFE2L2 oxidative stress pathway, differential utilization of the lineage markers SOX2 and TP63, and preference for the oncogenes PIK3CA and EGFR. For potential clinical use the signatures are complimentary to classification by HPV infection status as well as the putative high risk marker CCND1 copy number gain. A molecular etiology for the subtypes is suggested by statistically significant chromosomal gains and losses and differential cell of origin expression patterns. Model systems representative of each of the four subtypes are also presented
    • …
    corecore