221 research outputs found

    Improved indel detection in DNA and RNA via realignment with ABRA2

    Get PDF
    Motivation: Genomic variant detection from next-generation sequencing has become established as an extremely important component of research and clinical diagnoses in both cancer and Mendelian disorders. Insertions and deletions (indels) are a common source of variation and can frequently impact functionality, thus making their detection vitally important. While substantial effort has gone into detecting indels from DNA, there is still opportunity for improvement. Further, detection of indels from RNA-Seq data has largely been an afterthought and offers another critical area for variant detection. Results: We present here ABRA2, a redesign of the original ABRA implementation that offers support for realignment of both RNA and DNA short reads. The process results in improved accuracy and scalability including support for human whole genomes. Results demonstrate substantial improvement in indel detection for a variety of data types, including those that were not previously supported by ABRA. Further, ABRA2 results in broad improvements to variant calling accuracy across a wide range of post-processing workflows including whole genomes, targeted exomes and transcriptome sequencing

    The Iterative Signature Algorithm for the analysis of large scale gene expression data

    Full text link
    We present a new approach for the analysis of genome-wide expression data. Our method is designed to overcome the limitations of traditional techniques, when applied to large-scale data. Rather than alloting each gene to a single cluster, we assign both genes and conditions to context-dependent and potentially overlapping transcription modules. We provide a rigorous definition of a transcription module as the object to be retrieved from the expression data. An efficient algorithm, that searches for the modules encoded in the data by iteratively refining sets of genes and conditions until they match this definition, is established. Each iteration involves a linear map, induced by the normalized expression matrix, followed by the application of a threshold function. We argue that our method is in fact a generalization of Singular Value Decomposition, which corresponds to the special case where no threshold is applied. We show analytically that for noisy expression data our approach leads to better classification due to the implementation of the threshold. This result is confirmed by numerical analyses based on in-silico expression data. We discuss briefly results obtained by applying our algorithm to expression data from the yeast S. cerevisiae.Comment: Latex, 36 pages, 8 figure

    Genetic determinants of the molecular portraits of epithelial cancers

    Get PDF
    The ability to characterize and predict tumor phenotypes is crucial to precision medicine. In this study, we present an integrative computational approach using a genome-wide association analysis and an Elastic Net prediction method to analyze the relationship between DNA copy number alterations and an archive of gene expression signatures. Across breast cancers, we are able to quantitatively predict many gene signatures levels within individual tumors with high accuracy based upon DNA copy number features alone, including proliferation status and Estrogen-signaling pathway activity. We can also predict many other key phenotypes, including intrinsic molecular subtypes, estrogen receptor status, and TP53 mutation. This approach is also applied to TCGA Pan-Cancer, which identify repeatedly predictable signatures across tumor types including immune features in lung squamous and basal-like breast cancers. These Elastic Net DNA predictors could also be called from DNA-based gene panels, thus facilitating their use as biomarkers to guide therapeutic decision making

    High reproducibility using sodium hydroxide-stripped long oligonucleotide DNA microarrays

    Get PDF
    Recently, long oligonucleotide (60- to 70-mer) microarrays for two-color experiments have been developed and are gaining widespread use. In addition, when there is limited availability of mRNA from tissue sources, RNA amplification can and is being used to produce sufficient quantities of cRNA for microarray hybridization. Taking advantage of the selective degradation of RNA under alkaline conditions, we have developed a method to "strip" glass-based oligonucleotide microarrays that use fluorescent RNA in the hybridization, while leaving the DNA oligonucleotide probes intact and usable for a second experiment. Replicate microarray experiments conducted using stripped arrays showed high reproducibility, however, we found that arrays could only be stripped and reused once without compromising data quality. The intraclass correlation (ICC) between a virgin array and a stripped array hybridized with the same sample showed a range of 0.90-0.98, which is comparable to the ICC of two virgin arrays hybridized with the same sample. Using this method, once-stripped oligonucleotide microarrays are usable, reliable, and help to reduce costs

    Amplification of SOX4 promotes PI3K/Akt signaling in human breast cancer

    Get PDF
    Purpose: The PI3K/Akt signaling axis contributes to the dysregulation of many dominant features in breast cancer including cell proliferation, survival, metabolism, motility, and genomic instability. While multiple studies have demonstrated that basal-like or triple-negative breast tumors have uniformly high PI3K/Akt activity, genomic alterations that mediate dysregulation of this pathway in this subset of highly aggressive breast tumors remain to be determined. Methods: In this study, we present an integrated genomic analysis based on the use of a PI3K gene expression signature as a framework to analyze orthogonal genomic data from human breast tumors, including RNA expression, DNA copy number alterations, and protein expression. In combination with data from a genome-wide RNA-mediated interference screen in human breast cancer cell lines, we identified essential genetic drivers of PI3K/Akt signaling. Results: Our in silico analyses identified SOX4 amplification as a novel modulator of PI3K/Akt signaling in breast cancers and in vitro studies confirmed its role in regulating Akt phosphorylation. Conclusions: Taken together, these data establish a role for SOX4-mediated PI3K/Akt signaling in breast cancer and suggest that SOX4 may represent a novel therapeutic target and/or biomarker for current PI3K family therapies

    A pan-cancer analysis of the frequency of DNA alterations across cell cycle activity levels

    Get PDF
    Pan-cancer genomic analyses based on the magnitude of pathway activity are currently lacking. Focusing on the cell cycle, we examined the DNA mutations and chromosome arm-level aneuploidy within tumours with low, intermediate and high cell-cycle activity in 9515 pan-cancer patients with 32 different tumour types. Boxplots showed that cell-cycle activity varied broadly across and within all cancers. TP53 and PIK3CA mutations were common in all cell cycle score (CCS) tertiles but with increasing frequency as cell-cycle activity levels increased (P < 0.001). Mutations in BRAF and gains in 16p were less frequent in CCS High tumours (P < 0.001). In Kaplan–Meier analysis, patients whose tumours were CCS Low had a longer Progression Free Interval (PFI) relative to Intermediate or High (P < 0.001) and this significance remained in multivariable analysis (CCS Intermediate: HR = 1.37; 95% CI 1.17–1.60, CCS High: 1.54; 1.29–1.84, CCS Low = Ref). These results demonstrate that whilst similar DNA alterations can be found at all cell-cycle activity levels, some notable exceptions exist. Moreover, independent prognostic information can be derived on a pan-cancer level from a simple measure of cell-cycle activity

    Anti-PD-1 Checkpoint Therapy Can Promote the Function and Survival of Regulatory T Cells

    Get PDF
    We have previously shown in a model of claudin-low breast cancer that regulatory T cells (Tregs) are increased in the tumor microenvironment (TME) and express high levels of PD-1. In mouse models and patients with triple-negative breast cancer, it is postulated that one cause for the lack of activity of anti-PD-1 therapy is the activation of PD-1-expressing Tregs in the TME. We hypothesized that the expression of PD-1 on Tregs would lead to enhanced suppressive function of Tregs and worsen antitumor immunity during PD-1 blockade. To evaluate this, we isolated Tregs from claudin-low tumors and functionally evaluated them ex vivo. We compared transcriptional profiles of Tregs isolated from tumor-bearing mice with or without anti-PD-1 therapy using RNA sequencing. We found several genes associated with survival and proliferation pathways; for example, Jun, Fos, and Bcl2 were significantly upregulated in Tregs exposed to anti-PD-1 treatment. Based on these data, we hypothesized that anti-PD-1 treatment on Tregs results in a prosurvival phenotype. Indeed, Tregs exposed to PD-1 blockade had significantly higher levels of Bcl-2 expression, and this led to increased protection from glucocorticoid-induced apoptosis. In addition, we found in vitro and in vivo that Tregs in the presence of anti-PD-1 proliferated more than control Tregs. PD-1 blockade significantly increased the suppressive activity of Tregs at biologically relevant Treg/Tnaive cell ratios. Altogether, we show that this immunotherapy blockade increases proliferation, protection from apoptosis, and suppressive capabilities of Tregs, thus leading to enhanced immunosuppression in the TME

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types

    MultiK: an automated tool to determine optimal cluster numbers in single-cell RNA sequencing data

    Get PDF
    Single-cell RNA sequencing (scRNA-seq) provides new opportunities to characterize cell populations, typically accomplished through some type of clustering analysis. Estimation of the optimal cluster number (K) is a crucial step but often ignored. Our approach improves most current scRNA-seq cluster methods by providing an objective estimation of the number of groups using a multi-resolution perspective. MultiK is a tool for objective selection of insightful Ks and achieves high robustness through a consensus clustering approach. We demonstrate that MultiK identifies reproducible groups in scRNA-seq data, thus providing an objective means to estimating the number of possible groups or cell-type populations present

    Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with V'DJer

    Get PDF
    Motivation: B-cell receptor (BCR) repertoire profiling is an important tool for understanding the biology of diverse immunologic processes. Current methods for analyzing adaptive immune receptor repertoires depend upon PCR amplification of VDJ rearrangements followed by long read amplicon sequencing spanning the VDJ junctions. While this approach has proven to be effective, it is frequently not feasible due to cost or limited sample material. Additionally, there are many existing datasets where short-read RNA sequencing data are available but PCR amplified BCR data are not. Results: We present here V'DJer, an assembly-based method that reconstructs adaptive immune receptor repertoires from short-read RNA sequencing data. This method captures expressed BCR loci from a standard RNA-seq assay. We applied this method to 473 Melanoma samples from The Cancer Genome Atlas and demonstrate V'DJer's ability to accurately reconstruct BCR repertoires from short read mRNA-seq data
    corecore