70 research outputs found

    MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes

    Get PDF
    We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. MethylPurify can identify differentially methylated regions (DMRs) from individual tumor methylome samples, without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. From patient data, MethylPurify gave satisfactory DMR calls from tumor methylome samples alone, and revealed potential missed DMRs by tumor to normal comparison due to tumor heterogeneity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0419-x) contains supplementary material, which is available to authorized users

    Integrative single-cell meta-analysis reveals disease-relevant vascular cell states and markers in human atherosclerosis

    Get PDF
    Coronary artery disease (CAD) is characterized by atherosclerotic plaque formation in the arterial wall. CAD progression involves complex interactions and phenotypic plasticity among vascular and immune cell lineages. Single-cell RNA-seq (scRNA-seq) studies have highlighted lineage-specific transcriptomic signatures, but human cell phenotypes remain controversial. Here, we perform an integrated meta-analysis of 22 scRNA-seq libraries to generate a comprehensive map of human atherosclerosis with 118,578 cells. Besides characterizing granular cell-type diversity and communication, we leverage this atlas to provide insights into smooth muscle cell (SMC) modulation. We integrate genome-wide association study data and uncover a critical role for modulated SMC phenotypes in CAD, myocardial infarction, and coronary calcification. Finally, we identify fibromyocyte/fibrochondrogenic SMC markers (LTBP1 and CRTAC1) as proxies of atherosclerosis progression and validate these through omics and spatial imaging analyses. Altogether, we create a unified atlas of human atherosclerosis informing cell state-specific mechanistic and translational studies of cardiovascular diseases.</p

    Integrative single-cell meta-analysis reveals disease-relevant vascular cell states and markers in human atherosclerosis

    Get PDF
    Coronary artery disease (CAD) is characterized by atherosclerotic plaque formation in the arterial wall. CAD progression involves complex interactions and phenotypic plasticity among vascular and immune cell lineages. Single-cell RNA-seq (scRNA-seq) studies have highlighted lineage-specific transcriptomic signatures, but human cell phenotypes remain controversial. Here, we perform an integrated meta-analysis of 22 scRNA-seq libraries to generate a comprehensive map of human atherosclerosis with 118,578 cells. Besides characterizing granular cell-type diversity and communication, we leverage this atlas to provide insights into smooth muscle cell (SMC) modulation. We integrate genome-wide association study data and uncover a critical role for modulated SMC phenotypes in CAD, myocardial infarction, and coronary calcification. Finally, we identify fibromyocyte/fibrochondrogenic SMC markers (LTBP1 and CRTAC1) as proxies of atherosclerosis progression and validate these through omics and spatial imaging analyses. Altogether, we create a unified atlas of human atherosclerosis informing cell state-specific mechanistic and translational studies of cardiovascular diseases

    Transcriptional Regulation of Rod Photoreceptor Homeostasis Revealed by In Vivo NRL Targetome Analysis

    Get PDF
    A stringent control of homeostasis is critical for functional maintenance and survival of neurons. In the mammalian retina, the basic motif leucine zipper transcription factor NRL determines rod versus cone photoreceptor cell fate and activates the expression of many rod-specific genes. Here, we report an integrated analysis of NRL-centered gene regulatory network by coupling chromatin immunoprecipitation followed by high-throughput sequencing (ChIP–Seq) data from Illumina and ABI platforms with global expression profiling and in vivo knockdown studies. We identified approximately 300 direct NRL target genes. Of these, 22 NRL targets are associated with human retinal dystrophies, whereas 95 mapped to regions of as yet uncloned retinal disease loci. In silico analysis of NRL ChIP–Seq peak sequences revealed an enrichment of distinct sets of transcription factor binding sites. Specifically, we discovered that genes involved in photoreceptor function include binding sites for both NRL and homeodomain protein CRX. Evaluation of 26 ChIP–Seq regions validated their enhancer functions in reporter assays. In vivo knockdown of 16 NRL target genes resulted in death or abnormal morphology of rod photoreceptors, suggesting their importance in maintaining retinal function. We also identified histone demethylase Kdm5b as a novel secondary node in NRL transcriptional hierarchy. Exon array analysis of flow-sorted photoreceptors in which Kdm5b was knocked down by shRNA indicated its role in regulating rod-expressed genes. Our studies identify candidate genes for retinal dystrophies, define cis-regulatory module(s) for photoreceptor-expressed genes and provide a framework for decoding transcriptional regulatory networks that dictate rod homeostasis

    Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases

    Get PDF
    Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg2) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg2 from imputed SNPs (5.1× enrichment; p = 3.7 × 10−17) and 38% (SE = 4%) of hg2 from genotyped SNPs (1.6× enrichment, p = 1.0 × 10−4). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of hg2 despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease

    DARDN: A Deep-Learning Approach for CTCF Binding Sequence Classification and Oncogenic Regulatory Feature Discovery

    No full text
    Characterization of gene regulatory mechanisms in cancer is a key task in cancer genomics. CCCTC-binding factor (CTCF), a DNA binding protein, exhibits specific binding patterns in the genome of cancer cells and has a non-canonical function to facilitate oncogenic transcription programs by cooperating with transcription factors bound at flanking distal regions. Identification of DNA sequence features from a broad genomic region that distinguish cancer-specific CTCF binding sites from regular CTCF binding sites can help find oncogenic transcription factors in a cancer type. However, the presence of long DNA sequences without localization information makes it difficult to perform conventional motif analysis. Here, we present DNAResDualNet (DARDN), a computational method that utilizes convolutional neural networks (CNNs) for predicting cancer-specific CTCF binding sites from long DNA sequences and employs DeepLIFT, a method for interpretability of deep learning models that explains the model’s output in terms of the contributions of its input features. The method is used for identifying DNA sequence features associated with cancer-specific CTCF binding. Evaluation on DNA sequences associated with CTCF binding sites in T-cell acute lymphoblastic leukemia (T-ALL) and other cancer types demonstrates DARDN’s ability in classifying DNA sequences surrounding cancer-specific CTCF binding from control constitutive CTCF binding and identifying sequence motifs for transcription factors potentially active in each specific cancer type. We identify potential oncogenic transcription factors in T-ALL, acute myeloid leukemia (AML), breast cancer (BRCA), colorectal cancer (CRC), lung adenocarcinoma (LUAD), and prostate cancer (PRAD). Our work demonstrates the power of advanced machine learning and feature discovery approach in finding biologically meaningful information from complex high-throughput sequencing data
    • …
    corecore