2 research outputs found

    STATISTICAL METHODS FOR DECODING GENE REGULATION IN SINGLE CELLS

    Get PDF
    Single-cell sequencing is rapidly transforming biomedical research. With the ability to measure omics information in individual cells, it provides unprecedented resolution to study heterogeneous biological and clinical samples, enabling scientists to discover and characterize previously unknown biological signals and processes carried by novel or rare cell subpopulations. The new data structure and high level of noise in the single-cell genomic data pose significant analytical challenges. To address these challenges, we developed new statistical and computational methods for analyzing single-cell transcriptome and regulome data. First, to infer cells’ underlying developmental trajectories, we developed TSCAN that performs “pseudotime” analysis with a cluster-based minimum spanning tree approach. TSCAN facilitates accurate construction of pseudotemporal trajectories by regularizing the complexity of spanning trees. By improving the bias-variance tradeoff of the spanning tree estimation, TSCAN substantially improved the accuracy and robustness of the pseudotime analysis. Second, we developed RAISIN to support regression and differential analysis in single-cell RNA-seq datasets with multiple samples. Compared to classical linear mixed effects model, RAISIN improves variance estimate and statistical power for datasets with small sample size or cell number, and improves scalability for datasets with large sample size and millions of cells. Third, we developed SCATE to extract and enhance signals from the highly noisy and sparse single-cell ATAC-seq data. SCATE accurately infers genome-wide activities of each individual cis-regulatory element by adaptively integrating information from co-activated cis-regulatory elements, similar cells, and massive amounts of publicly available regulome data. The enhanced signal improves the performance of downstream analyses such as peak calling and prediction of transcription factor binding sites. These methods have been applied in numerous collaborative projects and helped decipher gene regulatory programs in T cell exhaustion process and identify molecular signatures in neoadjuvant immunotherapy

    Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation

    Get PDF
    Most genetic variations associated with human complex traits are located in non-coding genomic regions. Therefore, understanding the genotype-to-phenotype axis requires a comprehensive catalog of functional non-coding genomic elements, most of which are involved in epigenetic regulation of gene expression. Genome-wide maps of open chromatin regions can facilitate functional analysis of cis- and trans-regulatory elements via their connections with trait-associated sequence variants. Currently, Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq) is considered the most accessible and cost-effective strategy for genome-wide profiling of chromatin accessibility. Single-cell ATAC-seq (scATAC-seq) technology has also been developed to study cell type-specific chromatin accessibility in tissue samples containing a heterogeneous cellular population. However, due to the intrinsic nature of scATAC-seq data, which are highly noisy and sparse, accurate extraction of biological signals and devising effective biological hypothesis are difficult. To overcome such limitations in scATAC-seq data analysis, new methods and software tools have been developed over the past few years. Nevertheless, there is no consensus for the best practice of scATAC-seq data analysis yet. In this review, we discuss scATAC-seq technology and data analysis methods, ranging from preprocessing to downstream analysis, along with an up-to-date list of published studies that involved the application of this method. We expect this review will provide a guideline for successful data generation and analysis methods using appropriate software tools and databases for the study of chromatin accessibility at single-cell resolution. (C) 2020 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.ope
    corecore