80 research outputs found

    Accurate Profiling of Microbial Communities from Massively Parallel Sequencing using Convex Optimization

    Full text link
    We describe the Microbial Community Reconstruction ({\bf MCR}) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with 106\sim10^6 species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.Comment: To appear in SPIRE 1

    CTCF genetic alterations in endometrial carcinoma are pro-tumorigenic

    Get PDF
    CTCF is a haploinsufficient tumour suppressor gene with diverse normal functions in genome structure and gene regulation. However the mechanism by which CTCF haploinsufficiency contributes to cancer development is not well understood. CTCF is frequently mutated in endometrial cancer. Here we show that most CTCF mutations effectively result in CTCF haploinsufficiency through nonsense-mediated decay of mutant transcripts, or loss-of-function missense mutation. Conversely, we identified a recurrent CTCF mutation K365T, which alters a DNA binding residue, and acts as a gain-of-function mutation enhancing cell survival. CTCF genetic deletion occurs predominantly in poor prognosis serous subtype tumours, and this genetic deletion is associated with poor overall survival. In addition, we have shown that CTCF haploinsufficiency also occurs in poor prognosis endometrial clear cell carcinomas and has some association with endometrial cancer relapse and metastasis. Using shRNA targeting CTCF to recapitulate CTCF haploinsufficiency, we have identified a novel role for CTCF in the regulation of cellular polarity of endometrial glandular epithelium. Overall, we have identified two novel pro-tumorigenic roles (promoting cell survival and altering cell polarity) for genetic alterations of CTCF in endometrial cance

    Integrated analysis of RNA and DNA from the phase III trial CALGB 40601 identifies predictors of response to trastuzumab-based neoadjuvant chemotherapy in HER2-positive breast cancer

    Get PDF
    Purpose: Response to a complex trastuzumab-based regimen is affected by multiple features of the tumor and its microenvironment. Developing a predictive algorithm is key to optimizing HER2-targeting therapy. Experimental Design: We analyzed 137 pretreatment tumors with mRNA-seq and DNA exome sequencing from CALGB 40601, a neoadjuvant phase III trial of paclitaxel plus trastuzumab with or without lapatinib in stage II to III HER2-positive breast cancer. We adopted an Elastic Net regularized regression approach that controls for covarying features within high-dimensional data. First, we applied 517 known gene expression signatures to develop an Elastic Net model to predict pCR, which we validated on 143 samples from four independent trials. Next, we performed integrative analyses incorporating clinicopathologic information with somatic mutation status, DNA copy number alterations (CNA), and gene signatures. Results: The Elastic Net model using only gene signatures predicted pCR in the validation sets (AUC ¼ 0.76). Integrative analyses showed that models containing gene signatures, clinical features, and DNA information were better pCR predictors than models containing a single data type. Frequently selected variables from the multiplatform models included amplifications of chromosome 6p, TP53 mutation, HER2-enriched subtype, and immune signatures. Variables predicting resistance included Luminal/ERþ features. Conclusions: Models using RNA only, as well as integrated RNA and DNA models, can predict pCR with improved accuracy over clinical variables. Somatic DNA alterations (mutation, CNAs), tumor molecular subtype (HER2E, Luminal), and the microenvironment (immune cells) were independent predictors of response to trastuzumab and paclitaxel-based regimens. This highlights the complexity of predicting response in HER2-positive breast cancer

    Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes

    Get PDF
    Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole-genome sequencing to perform an unbiased comprehensive screen to discover the somatic mutations in a sample from an individual with sAML and genotyped the loci containing these mutations in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (Ser34) in U2AF1 was recurrently present in 13 out of 150 (8.7%) subjects with de novo MDS, and we found suggestive evidence of an increased risk of progression to sAML associated with this mutation. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3' end of introns, and the alterations in U2AF1 are located in highly conserved zinc fingers of this protein. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This previously unidentified, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types

    Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

    Get PDF
    A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved

    Driver Fusions and Their Implications in the Development and Treatment of Human Cancers.

    Get PDF
    Gene fusions represent an important class of somatic alterations in cancer. We systematically investigated fusions in 9,624 tumors across 33 cancer types using multiple fusion calling tools. We identified a total of 25,664 fusions, with a 63% validation rate. Integration of gene expression, copy number, and fusion annotation data revealed that fusions involving oncogenes tend to exhibit increased expression, whereas fusions involving tumor suppressors have the opposite effect. For fusions involving kinases, we found 1,275 with an intact kinase domain, the proportion of which varied significantly across cancer types. Our study suggests that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of them. Finally, we identified druggable fusions involving genes such as TMPRSS2, RET, FGFR3, ALK, and ESR1 in 6.0% of cases, and we predicted immunogenic peptides, suggesting that fusions may provide leads for targeted drug and immune therapy

    The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage

    Get PDF
    Background: We describe the genome of the western painted turtle, Chrysemys picta bellii, one of the most widespread, abundant, and well-studied turtles. We place the genome into a comparative evolutionary context, and focus on genomic features associated with tooth loss, immune function, longevity, sex differentiation and determination, and the species' physiological capacities to withstand extreme anoxia and tissue freezing.Results: Our phylogenetic analyses confirm that turtles are the sister group to living archosaurs, and demonstrate an extraordinarily slow rate of sequence evolution in the painted turtle. The ability of the painted turtle to withstand complete anoxia and partial freezing appears to be associated with common vertebrate gene networks, and we identify candidate genes for future functional analyses. Tooth loss shares a common pattern of pseudogenization and degradation of tooth-specific genes with birds, although the rate of accumulation of mutations is much slower in the painted turtle. Genes associated with sex differentiation generally reflect phylogeny rather than convergence in sex determination functionality. Among gene families that demonstrate exceptional expansions or show signatures of strong natural selection, immune function and musculoskeletal patterning genes are consistently over-represented.Conclusions: Our comparative genomic analyses indicate that common vertebrate regulatory networks, some of which have analogs in human diseases, are often involved in the western painted turtle's extraordinary physiological capacities. As these regulatory pathways are analyzed at the functional level, the painted turtle may offer important insights into the management of a number of human health disorders
    corecore