367,482 research outputs found
Recommended from our members
SCALE method for single-cell ATAC-seq analysis via latent feature extraction.
Single-cell ATAC-seq (scATAC-seq) profiles the chromatin accessibility landscape at single cell level, thus revealing cell-to-cell variability in gene regulation. However, the high dimensionality and sparsity of scATAC-seq data often complicate the analysis. Here, we introduce a method for analyzing scATAC-seq data, called Single-Cell ATAC-seq analysis via Latent feature Extraction (SCALE). SCALE combines a deep generative framework and a probabilistic Gaussian Mixture Model to learn latent features that accurately characterize scATAC-seq data. We validate SCALE on datasets generated on different platforms with different protocols, and having different overall data qualities. SCALE substantially outperforms the other tools in all aspects of scATAC-seq data analysis, including visualization, clustering, and denoising and imputation. Importantly, SCALE also generates interpretable features that directly link to cell populations, and can potentially reveal batch effects in scATAC-seq experiments
Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications
Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene-and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq
Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications.
Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIP-seq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression
Recommended from our members
Systematic alteration of ATAC-seq for profiling open chromatin in cryopreserved nuclei preparations from livestock tissues.
The use of Assay for Transposase-Accessible Chromatin (ATAC-seq) to profile chromatin accessibility has surged over the past years, but its applicability to tissues has been very limited. With the intent of preserving nuclear architecture during long-term storage, cryopreserved nuclei preparations from chicken lung were used to optimize ATAC-seq. Sequencing data were compared with existing DNase-seq, ChIP-seq, and RNA-seq data to evaluate library quality, ultimately resulting in a modified ATAC-seq method capable of generating high quality chromatin accessibility data from cryopreserved nuclei preparations. Using this method, nucleosome-free regions (NFR) identified in chicken lung overlapped half of DNase-I hypersensitive sites, coincided with active histone modifications, and specifically marked actively expressed genes. Notably, sequencing only the subnucleosomal fraction dramatically improved signal, while separation of subnucleosomal reads post-sequencing did not improve signal or peak calling. The broader applicability of this modified ATAC-seq technique was tested using cryopreserved nuclei preparations from pig tissues, resulting in NFR that were highly consistent among biological replicates. Furthermore, tissue-specific NFR were enriched for binding motifs of transcription factors related to tissue-specific functions, and marked genes functionally enriched for tissue-specific processes. Overall, these results provide insights into the optimization of ATAC-seq and a platform for profiling open chromatin in animal tissues
MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification
Next-generation RNA sequencing (RNA-seq) technology has been widely used to
assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq
data offer insight into gene expression levels and transcriptome structures,
enabling us to better understand the regulation of gene expression and
fundamental biological processes. Accurate isoform quantification from RNA-seq
data is challenging due to the information loss in sequencing experiments. A
recent accumulation of multiple RNA-seq data sets from the same tissue or cell
type provides new opportunities to improve the accuracy of isoform
quantification. However, existing statistical or computational methods for
multiple RNA-seq samples either pool the samples into one sample or assign
equal weights to the samples when estimating isoform abundance. These methods
ignore the possible heterogeneity in the quality of different samples and could
result in biased and unrobust estimates. In this article, we develop a method,
which we call "joint modeling of multiple RNA-seq samples for accurate isoform
quantification" (MSIQ), for more accurate and robust isoform quantification by
integrating multiple RNA-seq samples under a Bayesian framework. Our method
aims to (1) identify a consistent group of samples with homogeneous quality and
(2) improve isoform quantification accuracy by jointly modeling multiple
RNA-seq samples by allowing for higher weights on the consistent group. We show
that MSIQ provides a consistent estimator of isoform abundance, and we
demonstrate the accuracy and effectiveness of MSIQ compared with alternative
methods through simulation studies on D. melanogaster genes. We justify MSIQ's
advantages over existing approaches via application studies on real RNA-seq
data from human embryonic stem cells, brain tissues, and the HepG2 immortalized
cell line
- …