221 research outputs found
Improved indel detection in DNA and RNA via realignment with ABRA2
Motivation: Genomic variant detection from next-generation sequencing has become established as an extremely important component of research and clinical diagnoses in both cancer and Mendelian disorders. Insertions and deletions (indels) are a common source of variation and can frequently impact functionality, thus making their detection vitally important. While substantial effort has gone into detecting indels from DNA, there is still opportunity for improvement. Further, detection of indels from RNA-Seq data has largely been an afterthought and offers another critical area for variant detection. Results: We present here ABRA2, a redesign of the original ABRA implementation that offers support for realignment of both RNA and DNA short reads. The process results in improved accuracy and scalability including support for human whole genomes. Results demonstrate substantial improvement in indel detection for a variety of data types, including those that were not previously supported by ABRA. Further, ABRA2 results in broad improvements to variant calling accuracy across a wide range of post-processing workflows including whole genomes, targeted exomes and transcriptome sequencing
The Iterative Signature Algorithm for the analysis of large scale gene expression data
We present a new approach for the analysis of genome-wide expression data.
Our method is designed to overcome the limitations of traditional techniques,
when applied to large-scale data. Rather than alloting each gene to a single
cluster, we assign both genes and conditions to context-dependent and
potentially overlapping transcription modules. We provide a rigorous definition
of a transcription module as the object to be retrieved from the expression
data. An efficient algorithm, that searches for the modules encoded in the data
by iteratively refining sets of genes and conditions until they match this
definition, is established. Each iteration involves a linear map, induced by
the normalized expression matrix, followed by the application of a threshold
function. We argue that our method is in fact a generalization of Singular
Value Decomposition, which corresponds to the special case where no threshold
is applied. We show analytically that for noisy expression data our approach
leads to better classification due to the implementation of the threshold. This
result is confirmed by numerical analyses based on in-silico expression data.
We discuss briefly results obtained by applying our algorithm to expression
data from the yeast S. cerevisiae.Comment: Latex, 36 pages, 8 figure
Genetic determinants of the molecular portraits of epithelial cancers
The ability to characterize and predict tumor phenotypes is crucial to precision medicine. In this study, we present an integrative computational approach using a genome-wide association analysis and an Elastic Net prediction method to analyze the relationship between DNA copy number alterations and an archive of gene expression signatures. Across breast cancers, we are able to quantitatively predict many gene signatures levels within individual tumors with high accuracy based upon DNA copy number features alone, including proliferation status and Estrogen-signaling pathway activity. We can also predict many other key phenotypes, including intrinsic molecular subtypes, estrogen receptor status, and TP53 mutation. This approach is also applied to TCGA Pan-Cancer, which identify repeatedly predictable signatures across tumor types including immune features in lung squamous and basal-like breast cancers. These Elastic Net DNA predictors could also be called from DNA-based gene panels, thus facilitating their use as biomarkers to guide therapeutic decision making
High reproducibility using sodium hydroxide-stripped long oligonucleotide DNA microarrays
Recently, long oligonucleotide (60- to 70-mer) microarrays for two-color experiments have been developed and are gaining widespread use. In addition, when there is limited availability of mRNA from tissue sources, RNA amplification can and is being used to produce sufficient quantities of cRNA for microarray hybridization. Taking advantage of the selective degradation of RNA under alkaline conditions, we have developed a method to "strip" glass-based oligonucleotide microarrays that use fluorescent RNA in the hybridization, while leaving the DNA oligonucleotide probes intact and usable for a second experiment. Replicate microarray experiments conducted using stripped arrays showed high reproducibility, however, we found that arrays could only be stripped and reused once without compromising data quality. The intraclass correlation (ICC) between a virgin array and a stripped array hybridized with the same sample showed a range of 0.90-0.98, which is comparable to the ICC of two virgin arrays hybridized with the same sample. Using this method, once-stripped oligonucleotide microarrays are usable, reliable, and help to reduce costs
Amplification of SOX4 promotes PI3K/Akt signaling in human breast cancer
Purpose: The PI3K/Akt signaling axis contributes to the dysregulation of many dominant features in breast cancer including cell proliferation, survival, metabolism, motility, and genomic instability. While multiple studies have demonstrated that basal-like or triple-negative breast tumors have uniformly high PI3K/Akt activity, genomic alterations that mediate dysregulation of this pathway in this subset of highly aggressive breast tumors remain to be determined. Methods: In this study, we present an integrated genomic analysis based on the use of a PI3K gene expression signature as a framework to analyze orthogonal genomic data from human breast tumors, including RNA expression, DNA copy number alterations, and protein expression. In combination with data from a genome-wide RNA-mediated interference screen in human breast cancer cell lines, we identified essential genetic drivers of PI3K/Akt signaling. Results: Our in silico analyses identified SOX4 amplification as a novel modulator of PI3K/Akt signaling in breast cancers and in vitro studies confirmed its role in regulating Akt phosphorylation. Conclusions: Taken together, these data establish a role for SOX4-mediated PI3K/Akt signaling in breast cancer and suggest that SOX4 may represent a novel therapeutic target and/or biomarker for current PI3K family therapies
A pan-cancer analysis of the frequency of DNA alterations across cell cycle activity levels
Pan-cancer genomic analyses based on the magnitude of pathway activity are currently lacking. Focusing on the cell cycle, we examined the DNA mutations and chromosome arm-level aneuploidy within tumours with low, intermediate and high cell-cycle activity in 9515 pan-cancer patients with 32 different tumour types. Boxplots showed that cell-cycle activity varied broadly across and within all cancers. TP53 and PIK3CA mutations were common in all cell cycle score (CCS) tertiles but with increasing frequency as cell-cycle activity levels increased (P < 0.001). Mutations in BRAF and gains in 16p were less frequent in CCS High tumours (P < 0.001). In Kaplan–Meier analysis, patients whose tumours were CCS Low had a longer Progression Free Interval (PFI) relative to Intermediate or High (P < 0.001) and this significance remained in multivariable analysis (CCS Intermediate: HR = 1.37; 95% CI 1.17–1.60, CCS High: 1.54; 1.29–1.84, CCS Low = Ref). These results demonstrate that whilst similar DNA alterations can be found at all cell-cycle activity levels, some notable exceptions exist. Moreover, independent prognostic information can be derived on a pan-cancer level from a simple measure of cell-cycle activity
Anti-PD-1 Checkpoint Therapy Can Promote the Function and Survival of Regulatory T Cells
We have previously shown in a model of claudin-low breast cancer that regulatory T cells (Tregs) are increased in the tumor microenvironment (TME) and express high levels of PD-1. In mouse models and patients with triple-negative breast cancer, it is postulated that one cause for the lack of activity of anti-PD-1 therapy is the activation of PD-1-expressing Tregs in the TME. We hypothesized that the expression of PD-1 on Tregs would lead to enhanced suppressive function of Tregs and worsen antitumor immunity during PD-1 blockade. To evaluate this, we isolated Tregs from claudin-low tumors and functionally evaluated them ex vivo. We compared transcriptional profiles of Tregs isolated from tumor-bearing mice with or without anti-PD-1 therapy using RNA sequencing. We found several genes associated with survival and proliferation pathways; for example, Jun, Fos, and Bcl2 were significantly upregulated in Tregs exposed to anti-PD-1 treatment. Based on these data, we hypothesized that anti-PD-1 treatment on Tregs results in a prosurvival phenotype. Indeed, Tregs exposed to PD-1 blockade had significantly higher levels of Bcl-2 expression, and this led to increased protection from glucocorticoid-induced apoptosis. In addition, we found in vitro and in vivo that Tregs in the presence of anti-PD-1 proliferated more than control Tregs. PD-1 blockade significantly increased the suppressive activity of Tregs at biologically relevant Treg/Tnaive cell ratios. Altogether, we show that this immunotherapy blockade increases proliferation, protection from apoptosis, and suppressive capabilities of Tregs, thus leading to enhanced immunosuppression in the TME
An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics
For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types
MultiK: an automated tool to determine optimal cluster numbers in single-cell RNA sequencing data
Single-cell RNA sequencing (scRNA-seq) provides new opportunities to characterize cell populations, typically accomplished through some type of clustering analysis. Estimation of the optimal cluster number (K) is a crucial step but often ignored. Our approach improves most current scRNA-seq cluster methods by providing an objective estimation of the number of groups using a multi-resolution perspective. MultiK is a tool for objective selection of insightful Ks and achieves high robustness through a consensus clustering approach. We demonstrate that MultiK identifies reproducible groups in scRNA-seq data, thus providing an objective means to estimating the number of possible groups or cell-type populations present
Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with V'DJer
Motivation: B-cell receptor (BCR) repertoire profiling is an important tool for understanding the biology of diverse immunologic processes. Current methods for analyzing adaptive immune receptor repertoires depend upon PCR amplification of VDJ rearrangements followed by long read amplicon sequencing spanning the VDJ junctions. While this approach has proven to be effective, it is frequently not feasible due to cost or limited sample material. Additionally, there are many existing datasets where short-read RNA sequencing data are available but PCR amplified BCR data are not. Results: We present here V'DJer, an assembly-based method that reconstructs adaptive immune receptor repertoires from short-read RNA sequencing data. This method captures expressed BCR loci from a standard RNA-seq assay. We applied this method to 473 Melanoma samples from The Cancer Genome Atlas and demonstrate V'DJer's ability to accurately reconstruct BCR repertoires from short read mRNA-seq data
- …
