9 research outputs found

    Insights on the potential of RNA-Seq on improving pomological traits of African indigenous fruit trees: a mini review

    Get PDF
    Fruit tree improvement has taken great strides by roping in improved and efficient biotechnological tools to increase fruit yield and quality to meet local and export demands. For the past decade, the RNA-Seq tool has successfully been used in fruit tree improvement programs to identify genes, dissect complex traits, and understand different molecular pathways and differential expression of genes. However, despite their growing importance in food and nutrition security, medicinal uses, and climate change mitigation strategies, very little has been done to improve the pomological traits of African indigenous fruits, especially at the molecular level. African indigenous fruit trees exhibit unexplained variation in flowering, fruit load, fruit size, fruit ripening, fruit taste, fruit nutritional composition and shelf-life. The booming local commercial companies and export markets are demanding consistent quality indigenous fruits. This has necessitated the need for fast and effective tools that will hasten the understanding and improvement of fruiting qualities. The improvement of fruiting and fruit qualities will go a long way in accelerating the domestication and commercialization of African indigenous fruit trees. This review paper gives molecular biology insights on how RNA-Seq has been successfully used in fruit improvement of exotic fruits through gene identification, comparative transcriptome analysis under different conditions, and understanding molecular pathways that influence important pomological traits. The review article also unearths opportunities where RNA-Seq can improve our knowledge and improvement of undesirable traits common in African indigenous fruit

    Single-Cell RNA-Seq Technologies and Related Computational Data Analysis

    Get PDF
    Single-cell RNA sequencing (scRNA-seq) technologies allow the dissection of gene expression at single-cell resolution, which greatly revolutionizes transcriptomic studies. A number of scRNA-seq protocols have been developed, and these methods possess their unique features with distinct advantages and disadvantages. Due to technical limitations and biological factors, scRNA-seq data are noisier and more complex than bulk RNA-seq data. The high variability of scRNA-seq data raises computational challenges in data analysis. Although an increasing number of bioinformatics methods are proposed for analyzing and interpreting scRNA-seq data, novel algorithms are required to ensure the accuracy and reproducibility of results. In this review, we provide an overview of currently available single-cell isolation protocols and scRNA-seq technologies, and discuss the methods for diverse scRNA-seq data analyses including quality control, read mapping, gene expression quantification, batch effect correction, normalization, imputation, dimensionality reduction, feature selection, cell clustering, trajectory inference, differential expression calling, alternative splicing, allelic expression, and gene regulatory network reconstruction. Further, we outline the prospective development and applications of scRNA-seq technologies

    DSAVE: Detection of misclassified cells in single-cell RNA-Seq data

    Get PDF
    Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets

    Enhancing preprocessing and clustering of single-cell RNA sequencing data

    Get PDF
    Single-cell RNA sequencing (scRNA-seq) is the leading technique for characterizing cellular heterogeneity in biological samples. Various scRNA-seq protocols have been developed that can measure the transcriptome from thousands of cells in a single experiment. With these methods readily available, the ability to transform raw data into biological understanding of complex systems is now a rate-limiting step. In this dissertation, I introduce novel computational software and tools which enhance preprocessing and clustering of scRNA-seq data and evaluate their performance compared to existing methods. First, I present scruff, an R/Bioconductor package that preprocesses data generated from scRNA-seq protocols including CEL-Seq or CEL-Seq2 and reports comprehensive data quality metrics and visualizations. scruff rapidly demultiplexes, aligns, and counts the reads mapped to genomic features with deduplication of unique molecular identifier (UMI) tags and provides novel and extensive functions to visualize both pre- and post-alignment data quality metrics for cells from multiple experiments. Second, I present Celda, a novel Bayesian hierarchical model that can perform simultaneous co-clustering of genes into transcriptional modules and cells into subpopulations for scRNA-seq data. Celda identified novel cell subpopulations in a publicly available peripheral blood mononuclear cell (PBMC) dataset and outperformed a PCA-based approach for gene clustering on simulated data. Third, I extend the application of Celda by developing a multimodal clustering method that utilizes both mRNA and protein expression information generated from single-cell sequencing datasets with multiple modalities, and demonstrate that Celda multimodal clustering captured meaningful biological patterns which are missed by transcriptome- or protein-only clustering methods. Collectively, this work addresses limitations present in the computational analyses of scRNA-seq data by providing novel methods and solutions that enhance scRNA-seq data preprocessing and clustering

    Um fluxo de análise quantitativa de dados de transcriptômica de células únicas no contexo de células-tronco pluripotentes induzidas

    Get PDF
    TCC(graduação) - Universidade Federal de Santa Catarina. Centro de Ciências Biológicas. Biologia.Células-tronco pluripotentes induzidas são células reprogramadas a partir de células somáticas de modo a adquirir pluripotência – a capacidade de se diferenciar em qualquer tipo de célula. Com um protocolo de diferenciação adequado, podemos transformá-las em diversas outras células do organismo. Desde sua criação, diversos avanços em protocolos e técnicas laboratoriais permitem seu uso em pesquisa e terapias celulares. Contudo, o processo de diferenciação é falho e nem todas as células se transformam nas células alvo intencionadas. Nesse contexto, o sequenciamento de transcriptômica de células únicas se mostra uma poderosa ferramenta para a obtenção de informações. Ferramentas de bioinformática são fundamentais nesse processo, nos permitindo analisar a expressão gênica de uma célula e inferir seu tipo celular. Diversas ferramentas são utilizadas em diferentes passos do processo de análise. De modo geral, essas ferramentas são reprodutíveis. No entanto, é comum que o usuário tenha dificuldades em instalar a ferramenta e utilizar scripts fora do contexto onde foram escritos. Para que não ocorram situações como essas, estruturamos o uso dessas ferramentas em uma pipeline de análise. Boas práticas de construção de pipeline mostram a necessidade de desenvolvê-la de forma modular, reprodutível e compartimentalizada. Para tal, é necessário o uso de ferramentas de gerenciamento de fluxo de trabalho e containers de dependências dos pacotes. Este trabalho buscou construir uma pipeline de análise de dados de transcriptômica de células únicas no contexto de células-tronco pluripotentes induzidas. Além disso, visou criar um score que avalia a importância que determinado gene teve na classificação de uma amostra. As ferramentas de análise utilizadas na pipeline foram FUSCA, singleCellNet, Seurat e Symphony. Os recursos utilizados para a construção da estrutura da pipeline foram o gerenciador de fluxo de trabalho Snakemake e o container Singularity. A avaliação de eficácia da pipeline foi medida com sua aplicação em dados de células únicas de neurônios dopaminérgicos derivados de células-tronco pluripotentes induzidas, utilizando um conjunto de dados de células da região ventral do mesencéfalo de embriões humanos. A pipeline foi capaz de identificar os tipos celulares das células em questão e esses foram compatíveis com a tipagem feita pelos autores. As figuras geradas são acessíveis e podem ser utilizadas para a construção de um relatório ou trabalho científico. Por fim, a pipeline está disponível para acesso e uso público em https://github.com/gacrestani/ipsc-pipeline.Induced pluripotent stem cells (iPSCs) are cells reprogrammed from somatic cells to acquire pluripotency – the ability to differentiate into any cell type of an organism. With a differentiation protocol, one can transform them into those several other cells. Since their creation, several advances in laboratory protocols and techniques allow their use in biomedical research and cell therapies. However, the differentiation process is flawed and not all cells turn into the intended target cells. In this context, single cell transcriptomics sequencing proves to be a powerful tool for obtaining information. Bioinformatics tools are fundamental in this process, allowing us to analyze the gene expression of a cell and, by it, infer its cell type. Several tools are used in different steps of the analysis process. In general, these tools are reproducible. However, it is common for the user to have difficulties installing the tool and using scripts outside the context where they were written. To minimize those situations, we have structured the use of these tools in an analysis pipeline. Good pipeline construction practices state the need to develop it in a modular, reproducible and compartmentalized way. To do so, it is necessary to use workflow management tools and package dependency containers. This work aimed to build a pipeline for analyzing single cell transcriptomics data in the context of induced pluripotent stem cells. In addition, it aimed to create a score that assesses the importance that a given gene had in the classification of a sample. The analysis tools used in the pipeline were FUSCA, singleCellNet, Seurat and Symphony. The resources used to build the pipeline structure were the workflow manager Snakemake and the container manager Singularity. The evaluation of the effectiveness of the pipeline was measured with its application to single cell data from dopaminergic neurons derived from induced pluripotent stem cells, using a dataset of cells from the ventral region of the midbrain of human embryos. The pipeline was able to identify the cell types of the cells in question and these were compatible with the types found by the authors. The generated figures are accessible and can be used to build a report or scientific work. Finally, the pipeline is available for public use on https://github.com/gacrestani/ipsc-pipeline

    Traitement des données scRNA-seq issues de la technologie Drop-Seq : application à l’étude des réseaux transcriptionnels dans le cancer du sein

    Full text link
    Les technologies récentes de séquençage de l’ARN de cellules uniques (scRNA-seq, pour single cell RNA-seq) ont permis de quantifier le niveau d’expression des gènes au niveau de la cellules, alors que les technologies standards de séquençage de l’ARN (RNA-seq, ou bulk RNA-seq) ne permettaient de quantifier que l’expression moyenne des gènes dans un échantillon de cellules. Cette résolution supérieure a permis des avancées majeures dans le domaine de la recherche biomédicale, mais a également posé de nouveaux défis, notamment computationnels. Les données qui découlent des technologies de scRNA-seq sont en effet complexes et plus bruitées que les données de bulk RNA-seq. En outre, les technologies sont nombreuses et leur nombre explose, nécessitant chacune un prétraitement plus ou moins différent. De plus en plus de méthodes sont ainsi développées, mais il n’existe pas encore de norme établie (gold standard) pour le prétraitement et l’analyse de ces données. Le laboratoire du Dr. Mader a récemment fait l’acquisition de la technologie Drop-Seq (une technologie haut débit de scRNA-seq), nécessitant une expertise nouvelle pour le traitement des données qui en découlent. Dans ce mémoire, différentes étapes du prétraitement des données issues de la technologie Drop-Seq sont donc passées en revue et le fonctionnement de certains outils dédiés à cet effet est étudié, permettant d’établir des lignes directrices pour de futures expériences au sein du laboratoire du Dr. Mader. Cette étude est menée sur les premiers jeux de données générés avec la technologie Drop-Seq du laboratoire, issus de lignées cellulaires du cancer du sein. Les méthodes d’analyses, moins spécifiques à la technologie, ne sont pas étudiées dans ce mémoire, mais une analyse exploratoire des jeux de données du laboratoire pose les bases pour une analyse plus poussée.Recent single cell RNA sequencing technologies (scRNA-seq) have enabled the quantification of gene expression levels at the cellular level, while standard RNA sequencing technologies (RNA-seq, or bulk RNA-seq) have only been able to quantify the average gene expression in a sample of cells. This higher resolution has allowed major advances in biomedical research, but has also raised new challenges, in particular computational ones. The data derived from scRNA-seq technologies are indeed complex and noisier than bulk RNA-seq data. In addition, the number of scRNA-seq technologies is exploding, each of them requiring a rather different pre-processing. More and more methods are thus being developed, but there is still no gold standard for the preprocessing and analysis of these data. Dr. Mader’s laboratory has recently invested in the Drop-Seq technology (a high-throughput scRNAseq technology), requiring new expertise for the processing of the resulting data. In this thesis, different steps for the pre-processing of Drop-Seq data are reviewed and the behavior of some of the dedicated tools are studied, allowing to establish guidelines for future experiments in Dr. Mader’s laboratory. This study is conducted on the first data sets generated with the Drop-Seq technology of the laboratory, derived from breast cancer cell lines. Analytical methods, less specific to the technology, are not investigated in this thesis, but an exploratory analysis of the lab’s datasets lays the foundation for further analysis

    Quality control of single-cell RNA-seq by SinQC

    No full text
    corecore