70 research outputs found
Recommended from our members
Analysis of optimized DNase-seq reveals intrinsic bias in transcription factor footprint identification
DNase-seq is a powerful technique for identifying cis-regulatory elements across the genome. We studied the key experimental parameters to optimize the performance of DNase-seq. We found that sequencing short 50-100bp fragments that accumulate in long inter-nucleosome linker regions is more efficient for identifying transcription factor binding sites than using longer fragments. We also assessed the potential of DNase-seq to predict transcription factor occupancy through the generation of nucleotide-resolution transcription factor footprints. In modeling the sequence-specific DNaseI cutting bias we found a surprisingly strong effect that varied over more than two orders of magnitude. This confounds DNaseI footprint analysis to the extent that the nucleotide resolution cleavage patterns at most transcription factor binding sites are derived from intrinsic DNaseI cleavage bias rather than from specific protein-DNA interactions. In contrast, quantitative comparison of DNaseI hypersensitivity between states can predict transcription factor occupancy associated with particular biological perturbations
MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes
We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. MethylPurify can identify differentially methylated regions (DMRs) from individual tumor methylome samples, without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. From patient data, MethylPurify gave satisfactory DMR calls from tumor methylome samples alone, and revealed potential missed DMRs by tumor to normal comparison due to tumor heterogeneity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0419-x) contains supplementary material, which is available to authorized users
Integrative single-cell meta-analysis reveals disease-relevant vascular cell states and markers in human atherosclerosis
Coronary artery disease (CAD) is characterized by atherosclerotic plaque formation in the arterial wall. CAD progression involves complex interactions and phenotypic plasticity among vascular and immune cell lineages. Single-cell RNA-seq (scRNA-seq) studies have highlighted lineage-specific transcriptomic signatures, but human cell phenotypes remain controversial. Here, we perform an integrated meta-analysis of 22 scRNA-seq libraries to generate a comprehensive map of human atherosclerosis with 118,578 cells. Besides characterizing granular cell-type diversity and communication, we leverage this atlas to provide insights into smooth muscle cell (SMC) modulation. We integrate genome-wide association study data and uncover a critical role for modulated SMC phenotypes in CAD, myocardial infarction, and coronary calcification. Finally, we identify fibromyocyte/fibrochondrogenic SMC markers (LTBP1 and CRTAC1) as proxies of atherosclerosis progression and validate these through omics and spatial imaging analyses. Altogether, we create a unified atlas of human atherosclerosis informing cell state-specific mechanistic and translational studies of cardiovascular diseases.</p
Integrative single-cell meta-analysis reveals disease-relevant vascular cell states and markers in human atherosclerosis
Coronary artery disease (CAD) is characterized by atherosclerotic plaque formation in the arterial wall. CAD progression involves complex interactions and phenotypic plasticity among vascular and immune cell lineages. Single-cell RNA-seq (scRNA-seq) studies have highlighted lineage-specific transcriptomic signatures, but human cell phenotypes remain controversial. Here, we perform an integrated meta-analysis of 22 scRNA-seq libraries to generate a comprehensive map of human atherosclerosis with 118,578 cells. Besides characterizing granular cell-type diversity and communication, we leverage this atlas to provide insights into smooth muscle cell (SMC) modulation. We integrate genome-wide association study data and uncover a critical role for modulated SMC phenotypes in CAD, myocardial infarction, and coronary calcification. Finally, we identify fibromyocyte/fibrochondrogenic SMC markers (LTBP1 and CRTAC1) as proxies of atherosclerosis progression and validate these through omics and spatial imaging analyses. Altogether, we create a unified atlas of human atherosclerosis informing cell state-specific mechanistic and translational studies of cardiovascular diseases
Transcriptional Regulation of Rod Photoreceptor Homeostasis Revealed by In Vivo NRL Targetome Analysis
A stringent control of homeostasis is critical for functional maintenance and survival of neurons. In the mammalian retina, the basic motif leucine zipper transcription factor NRL determines rod versus cone photoreceptor cell fate and activates the expression of many rod-specific genes. Here, we report an integrated analysis of NRL-centered gene regulatory network by coupling chromatin immunoprecipitation followed by high-throughput sequencing (ChIP–Seq) data from Illumina and ABI platforms with global expression profiling and in vivo knockdown studies. We identified approximately 300 direct NRL target genes. Of these, 22 NRL targets are associated with human retinal dystrophies, whereas 95 mapped to regions of as yet uncloned retinal disease loci. In silico analysis of NRL ChIP–Seq peak sequences revealed an enrichment of distinct sets of transcription factor binding sites. Specifically, we discovered that genes involved in photoreceptor function include binding sites for both NRL and homeodomain protein CRX. Evaluation of 26 ChIP–Seq regions validated their enhancer functions in reporter assays. In vivo knockdown of 16 NRL target genes resulted in death or abnormal morphology of rod photoreceptors, suggesting their importance in maintaining retinal function. We also identified histone demethylase Kdm5b as a novel secondary node in NRL transcriptional hierarchy. Exon array analysis of flow-sorted photoreceptors in which Kdm5b was knocked down by shRNA indicated its role in regulating rod-expressed genes. Our studies identify candidate genes for retinal dystrophies, define cis-regulatory module(s) for photoreceptor-expressed genes and provide a framework for decoding transcriptional regulatory networks that dictate rod homeostasis
Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases
Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg2) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg2 from imputed SNPs (5.1× enrichment; p = 3.7 × 10−17) and 38% (SE = 4%) of hg2 from genotyped SNPs (1.6× enrichment, p = 1.0 × 10−4). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of hg2 despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease
Recommended from our members
Active enhancers are delineated de novo during hematopoiesis, with limited lineage fidelity among specified primary blood cells
Tissues may adopt diverse strategies to establish specific transcriptional programs in daughter lineages. In intestinal crypts, enhancers for genes expressed in both major cell types appear broadly permissive in stem and specified progenitor cells. In blood, another self-renewing tissue, it is unclear when chromatin becomes permissive for transcription of genes expressed in distinct terminal lineages. Using chromatin immunoprecipitation (ChIP) combined with deep sequencing (ChIP-seq) to profile activating histone marks, we studied enhancer dynamics in primary mouse blood stem, progenitor, and specified cells. Stem and multipotent progenitor cells show scant H3K4me2 marking at enhancers bound by specific transcription factors in their committed progeny. Rather, enhancers are modulated dynamically and serially, with substantial loss and gain of H3K4me2, at each cellular transition. Quantitative analysis of these dynamics accurately modeled hematopoiesis according to Waddington’s notion of epigenotypes. Delineation of enhancers in terminal blood lineages coincides with cell specification, and enhancers active in single lineages show well-positioned H3K4me2- and H3K27ac-marked nucleosomes and DNaseI hypersensitivity in other cell types, revealing limited lineage fidelity. These findings demonstrate that enhancer chronology in blood cells differs markedly from that in intestinal crypts. Chromatin dynamics in hematopoiesis provide a useful foundation to consider classical observations such as cellular reprogramming and multilineage locus priming
Recommended from our members
NF-E2, FLI1 and RUNX1 collaborate at areas of dynamic chromatin to activate transcription in mature mouse megakaryocytes
Mutations in mouse and human Nfe2, Fli1 and Runx1 cause thrombocytopenia. We applied genome-wide chromatin dynamics and ChIP-seq to determine these transcription factors’ (TFs) activities in terminal megakaryocyte (MK) maturation. Enhancers with H3K4me2-marked nucleosome pairs were most enriched for NF-E2, FLI and RUNX sequence motifs, suggesting that this TF triad controls much of the late MK program. ChIP-seq revealed NF-E2 occupancy near previously implicated target genes, whose expression is compromised in Nfe2-null cells, and many other genes that become active late in MK differentiation. FLI and RUNX were also the motifs most enriched near NF-E2 binding sites and ChIP-seq implicated FLI1 and RUNX1 in activation of late MK, including NF-E2-dependent, genes. Histones showed limited activation in regions of single TF binding, while enhancers that bind NF-E2 and either RUNX1, FLI1 or both TFs gave the highest signals for TF occupancy and H3K4me2; these enhancers associated best with genes activated late in MK maturation. Thus, three essential TFs co-occupy late-acting cis-elements and show evidence for additive activity at genes responsible for platelet assembly and release. These findings provide a rich dataset of TF and chromatin dynamics in primary MK and explain why individual TF losses cause thrombopocytopenia
DARDN: A Deep-Learning Approach for CTCF Binding Sequence Classification and Oncogenic Regulatory Feature Discovery
Characterization of gene regulatory mechanisms in cancer is a key task in cancer genomics. CCCTC-binding factor (CTCF), a DNA binding protein, exhibits specific binding patterns in the genome of cancer cells and has a non-canonical function to facilitate oncogenic transcription programs by cooperating with transcription factors bound at flanking distal regions. Identification of DNA sequence features from a broad genomic region that distinguish cancer-specific CTCF binding sites from regular CTCF binding sites can help find oncogenic transcription factors in a cancer type. However, the presence of long DNA sequences without localization information makes it difficult to perform conventional motif analysis. Here, we present DNAResDualNet (DARDN), a computational method that utilizes convolutional neural networks (CNNs) for predicting cancer-specific CTCF binding sites from long DNA sequences and employs DeepLIFT, a method for interpretability of deep learning models that explains the model’s output in terms of the contributions of its input features. The method is used for identifying DNA sequence features associated with cancer-specific CTCF binding. Evaluation on DNA sequences associated with CTCF binding sites in T-cell acute lymphoblastic leukemia (T-ALL) and other cancer types demonstrates DARDN’s ability in classifying DNA sequences surrounding cancer-specific CTCF binding from control constitutive CTCF binding and identifying sequence motifs for transcription factors potentially active in each specific cancer type. We identify potential oncogenic transcription factors in T-ALL, acute myeloid leukemia (AML), breast cancer (BRCA), colorectal cancer (CRC), lung adenocarcinoma (LUAD), and prostate cancer (PRAD). Our work demonstrates the power of advanced machine learning and feature discovery approach in finding biologically meaningful information from complex high-throughput sequencing data
- …