10 research outputs found

    A new graph-based clustering method with application to single-cell RNA-seq data from human pancreatic islets.

    Get PDF
    Traditional bulk RNA-sequencing of human pancreatic islets mainly reflects transcriptional response of major cell types. Single-cell RNA sequencing technology enables transcriptional characterization of individual cells, and thus makes it possible to detect cell types and subtypes. To tackle the heterogeneity of single-cell RNA-seq data, powerful and appropriate clustering is required to facilitate the discovery of cell types. In this paper, we propose a new clustering framework based on a graph-based model with various types of dissimilarity measures. We take the compositional nature of single-cell RNA-seq data into account and employ log-ratio transformations. The practical merit of the proposed method is demonstrated through the application to the centered log-ratio-transformed single-cell RNA-seq data for human pancreatic islets. The practical merit is also demonstrated through comparisons with existing single-cell clustering methods. The R-package for the proposed method can be found at https://github.com/Zhang-Data-Science-Research-Lab/LrSClust

    Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data.

    Get PDF
    Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues

    Unsupervised Pattern Recognition on Large-scale Genomics Data

    No full text
    Recent advance on biotechnologies such as the single-cell RNA sequencing technology and the Hi-C assays produces huge amount of unlabelled information and opens the door for many biomedical researches, such as transcriptional characterization of individual cells, comprehensive chromosomal conformation investigation, etc. In this thesis, we study the problem of using unsupervised methods such as clustering and scan spatial clustering to extract patterns and learn representations from single-cell RNA-seq and Hi-C data. To tackle the heterogeneity of single-cell RNA-seq data, powerful and appropriate clustering is required to facilitate the discovery of cell types. In this dissertation research, we propose a graph-based clustering method, Linf-SClust, and another distribution-based approach, RDMM, to extract the cluster configurations in two different perspectives. The Linf-SClust is a novel tuning-free graph-based model which constructs the graph by l-infinity measure and the entropy equalizer similarity, and divides the graph via spectral clustering. Parameter tuning and determination of the number of clusters are guided by the Gap statistic, which makes Linf-SClust a fully automatic approach. Our other method, RDMM, is a regularized Dirichlet-Multinomial finite-mixture model which addresses the gene expression clustering problem in a compositional fashion. The advantages of Linf-SClust and RDMM are shown through simulations and real applications. The Hi-C experiment enables assessment of the chromosomal structural information, including the detection of structural variations, especially translocations. In this dissertation research, we formulate the inter-chromosomal translocation detection as a problem of scan clustering in spatial point process. We then develop TranScan, a new translocation detection method via scan statistics with the control of false discovery. The real application of TranScan to Hi-C data in breast cancer research, successfully identifies previously discovered translocation events and also suggests a new putative segment translocated between nonhomologous chromosomes

    Translocation detection from Hi-C data via scan statistics.

    No full text
    Recent Hi-C technology enables more comprehensive chromosomal conformation research, including the detection of structural variations, especially translocations. In this paper, we formulate the interchromosomal translocation detection as a problem of scan clustering in a spatial point process. We then develop TranScan, a new translocation detection method through scan statistics with the control of false discovery. The simulation shows that TranScan is more powerful than an existing sophisticated scan clustering method, especially under strong signal situations. Evaluation of TranScan against current translocation detection methods on realistic breakpoint simulations generated from real data suggests better discriminative power under the receiver-operating characteristic curve. Power analysis also highlights TranScan\u27s consistent outperformance when sequencing depth and heterozygosity rate is varied. Comparatively, Type I error rate is lowest when evaluated using a karyotypically normal cell line. Both the simulation and real data analysis indicate that TranScan has great potentials in interchromosomal translocation detection using Hi-C data

    The effect of ethylene-amine ligands enhancing performance and stability of perovskite solar cells

    No full text
    The inclusion of long chain alkyl-amine organics in perovskite solar cells (PSCs) has been reported to enhance water-resistance of perovskite films, but this strategy lowers device efficiency at the same time. Herein, we develop an approach that combines molecular dimensionality control and interfacial passivation of perovskite layers using a novel post-device treatment (PDT) with the vapour of ethylene-amine salts of different carbon chain lengths to improve both efficiency and stability of the PSCs. The effect of a series of ligand vapours including ethylenediamine (EDA), diethylenetriamine (DETA) and triethylenetetramine (TETA) was systematically investigated. A thin hydrophobic two-dimensional (2D) perovskite capping layer formed in the device after the 3D perovskite was exposed to the vapour of long chain ethylene-amine molecules, such as DETA and TETA, which protected the underlying bulk 3D perovskite layer from moisture attack. An improved energy level alignment was obtained in the treated devices and that a reduced density of defects was present in the perovskite after treatment with DETA and TETA vapours. Consequently, enhanced efficiency from 17.07% to 18.09% (DETA) and improved moisture stability with PCE retention from 73.8% to 90.0% (TETA) under a relative humidity>65% for 1000 h were achieved by this vapour treatment respectively.</p

    Dimensionality-controlled surface passivation for enhancing performance and stability of perovskite solar cells via triethylenetetramine vapor

    No full text
    Perovskite solar cells (PSCs) have achieved unprecedented progress in terms of enhancement of power conversion efficiency (PCE). Nevertheless, device stability is still an obstacle to the commercialization of this emerging photovoltaic technology. Though strategies such as compositional management and ligand engineering have been reported to tackle this critical issue, these methods often have drawbacks such as compromised device performance. Herein, we propose an approach combining material dimensionality control and interfacial passivation by a post-device treatment via triethylenetetramine (TETA) vapor to enhance both efficiency and stability of CsFAMAPbIBr-based PSCs. Results of X-ray diffraction and scanning electron microscopy show the formation of low-dimensional perovskites at the interface between the perovskite film and the hole transporting layer after the TETA vapor treatment. Measurements of the energy level alignment and electrochemical properties by ultraviolet photoelectron spectroscopy and impedance spectra confirm the reduced density of trap states and improved interfacial charge transport. Consequently, TETA-based treatment significantly enhances both efficiency (from 17.07 to 18.03%) and stability (PCE retention from 73.4 to 88.9%) of the PSCs under >65% relative humidity for 1000 h compared to the controlled device without TETA treatment. Furthermore, the TETA vapor also shows an advantageous effect of dramatically improving the performance of PSC devices, which initially had poor performance (from 6.8 to 10.5%) through surface defect passivation

    Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data

    No full text
    Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pairrule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues
    corecore