15 research outputs found

    Statistical Algorithms and Bioinformatics Tools Development for Computational Analysis of High-throughput Transcriptomic Data

    Get PDF
    Next-Generation Sequencing technologies allow for a substantial increase in the amount of data available for various biological studies. In order to effectively and efficiently analyze this data, computational approaches combining mathematics, statistics, computer science, and biology are implemented. Even with the substantial efforts devoted to development of these approaches, numerous issues and pitfalls remain. One of these issues is mapping uncertainty, in which read alignment results are biased due to the inherent difficulties associated with accurately aligning RNA-Sequencing reads. GeneQC is an alignment quality control tool that provides insight into the severity of mapping uncertainty in each annotated gene from alignment results. GeneQC used feature extraction to identify three levels of information for each gene and implements elastic net regularization and mixture model fitting to provide insight in the severity of mapping uncertainty and the quality of read alignment. In combination with GeneQC, the Ambiguous Reads Mapping (ARM) algorithm works to re-align ambiguous reads through the integration of motif prediction from metabolic pathways to establish coregulatory gene modules for re-alignment using a negative binomial distribution-based probabilistic approach. These two tools work in tandem to address the issue of mapping uncertainty and provide more accurate read alignments, and thus more accurate expression estimates. Also presented in this dissertation are two approaches to interpreting the expression estimates. The first is IRIS-EDA, an integrated shiny web server that combines numerous analyses to investigate gene expression data generated from RNASequencing data. The second is ViDGER, an R/Bioconductor package that quickly generates high-quality visualizations of differential gene expression results to assist users in comprehensive interpretations of their differential gene expression results, which is a non-trivial task. These four presented tools cover a variety of aspects of modern RNASeq analyses and aim to address bottlenecks related to algorithmic and computational issues, as well as more efficient and effective implementation methods

    IRIS-EDA: An Integrated RNA-Seq Interpretation System for Gene Expression Data Analysis

    Get PDF
    Next-Generation Sequencing has made available substantial amounts of large-scale Omics data, providing unprecedented opportunities to understand complex biological systems. Specifically, the value of RNA-Sequencing (RNA-Seq) data has been confirmed in inferring how gene regulatory systems will respond under various conditions (bulk data) or cell types (single-cell data). RNA-Seq can generate genome-scale gene expression profiles that can be further analyzed using correlation analysis, co-expression analysis, clustering, differential gene expression (DGE), among many other studies. While these analyses can provide invaluable information related to gene expression, integration and interpretation of the results can prove challenging. Here we present a tool called IRIS-EDA, which is a Shiny web server for expression data analysis. It provides a straightforward and user-friendly platform for performing numerous computational analyses on user-provided RNA-Seq or Single-cell RNA-Seq (scRNA-Seq) data. Specifically, three commonly used R packages (edgeR, DESeq2, and limma) are implemented in the DGE analysis with seven unique experimental design functionalities, including a user-specified design matrix option. Seven discovery-driven methods and tools (correlation analysis, heatmap, clustering, biclustering, Principal Component Analysis (PCA), Multidimensional Scaling (MDS), and t-distributed Stochastic Neighbor Embedding (t-SNE)) are provided for gene expression exploration which is useful for designing experimental hypotheses and determining key factors for comprehensive DGE analysis. Furthermore, this platform integrates seven visualization tools in a highly interactive manner, for improved interpretation of the analyses. It is noteworthy that, for the first time, IRIS-EDA provides a framework to expedite submission of data and results to NCBI’s Gene Expression Omnibus following the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles. IRIS-EDA is freely available at http://bmbl.sdstate.edu/IRIS/

    Improved Draft Genome Sequence of \u3cem\u3eBacillus\u3c/em\u3e sp. Strain YF23, Which Has Plant Growth-Promoting Activity

    Get PDF
    We report here the improved draft genome sequence of Bacillus sp. strain YF23, a bacterium originally isolated from switchgrass (Panicum virgatum) plants and shown to exhibit plant growth-promoting activity. The genome comprised 5.82 Mbp, containing 5,933 genes, with 193 as RNA genes, and a GC content of 35.10%

    Designing a Regional System of Social Indicators to Evaluate Nonpoint Source Water Projects

    Get PDF
    A collaborative team has developed a system to measure the social outcomes of nonpoint source water projects as indicators of progress towards environmental goals. The system involves a set of core indicators, additional supplemental indicators, and a process for collecting and using the indicators. This process is supported by methodologies and instruments for data collection, analysis, and reporting that are coordinated and supported through detailed written guidance and an on-line data management tool. Its multi-state scope and application offer a unique opportunity to target, measure, and report interim resource management accomplishments consistently at multiple levels

    Designing a Regional System of Social Indicators to Evaluate Nonpoint Source Water Projects

    Get PDF
    A collaborative team has developed a system to measure the social outcomes of nonpoint source water projects as indicators of progress towards environmental goals. The system involves a set of core indicators, additional supplemental indicators, and a process for collecting and using the indicators. This process is supported by methodologies and instruments for data collection, analysis, and reporting that are coordinated and supported through detailed written guidance and an on-line data management tool. Its multi-state scope and application offer a unique opportunity to target, measure, and report interim resource management accomplishments consistently at multiple levels

    Rootstock Effects on Scion Phenotypes in a ‘Chambourcin’ Experimental Vineyard

    Get PDF
    Understanding how root systems modulate shoot system phenotypes is a fundamental question in plant biology and will be useful in developing resilient agricultural crops. Grafting is a common horticultural practice that joins the roots (rootstock) of one plant to the shoot (scion) of another, providing an excellent method for investigating how these two organ systems affect each other. In this study, we used the French-American hybrid grapevine ‘Chambourcin’ (Vitis L.) as a model to explore the rootstock–scion relationship. We examined leaf shape, ion concentrations, and gene expression in ‘Chambourcin’ grown ungrafted as well as grafted to three different rootstocks (‘SO4’, ‘1103P’ and ‘3309C’) across 2 years and three different irrigation treatments. We found that a significant amount of the variation in leaf shape could be explained by the interaction between rootstock and irrigation. For ion concentrations, the primary source of variation identified was the position of a leaf in a shoot, although rootstock and rootstock by irrigation interaction also explained a significant amount of variation for most ions. Lastly, we found rootstock-specific patterns of gene expression in grafted plants when compared to ungrafted vines. Thus, our work reveals the subtle and complex effect of grafting on ‘Chambourcin’ leaf morphology, ionomics, and gene expression

    IRIS3: integrated cell-type-specific regulon inference server from single-cell RNA-Seq

    Get PDF
    group of genes controlled as a unit, usually by the same repressor or activator gene, is known as a regulon. The ability to identify active regulons within a specific cell type, i.e., cell-type-specific regulons (CTSR), provides an extraordinary opportunity to pinpoint crucial regulators and target genes responsible for complex diseases. However, the identification of CTSRs from single-cell RNA-Seq (scRNA-Seq) data is computationally challenging. We introduce IRIS3, the first-of-its-kind web server for CTSR inference from scRNA-Seq data for human and mouse. IRIS3 is an easy-to-use server empowered by over 20 functionalities to support comprehensive interpretations and graphical visualizations of identified CTSRs. CTSR data can be used to reliably characterize and distinguish the corresponding cell type from others and can be combined with other computational or experimental analyses for biomedical studies. CTSRs can, therefore, aid in the discovery of major regulatory mechanisms and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. The broader impact of IRIS3 includes, but is not limited to, investigation of complex diseases hierarchies and heterogeneity, causal gene regulatory network construction, and drug development

    RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis

    Get PDF
    Regulons, which serve as co-regulated gene groups contributing to the transcriptional regulation of microbial genomes, have the potential to aid in understanding of underlying regulatory mechanisms. In this study, we designed a novel computational pipeline, regulon identification based on comparative genomics and transcriptomics analysis (RECTA), for regulon prediction related to the gene regulatory network under certain conditions. To demonstrate the effectiveness of this tool, we implemented RECTA on Lactococcus lactis MG1363 data to elucidate acid-response regulons. A total of 51 regulons were identified, 14 of which have computational-verified significance. Among these 14 regulons, five of them were computationally predicted to be connected with acid stress response. Validated by literature, 33 genes in Lactococcus lactis MG1363 were found to have orthologous genes which were associated with six regulons. An acid response related regulatory network was constructed, involving two trans-membrane proteins, eight regulons (llrA, llrC, hllA, ccpA, NHP6A, rcfB, regulons #8 and #39), nine functional modules, and 33 genes with orthologous genes known to be associated with acid stress. The predicted response pathways could serve as promising candidates for better acid tolerance engineering in Lactococcus lactis. Our RECTA pipeline provides an effective way to construct a reliable gene regulatory network through regulon elucidation, and has strong application power and can be effectively applied to other bacterial genomes where the elucidation of the transcriptional regulation network is needed

    IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis.

    Get PDF
    Next-Generation Sequencing has made available substantial amounts of large-scale Omics data, providing unprecedented opportunities to understand complex biological systems. Specifically, the value of RNA-Sequencing (RNA-Seq) data has been confirmed in inferring how gene regulatory systems will respond under various conditions (bulk data) or cell types (single-cell data). RNA-Seq can generate genome-scale gene expression profiles that can be further analyzed using correlation analysis, co-expression analysis, clustering, differential gene expression (DGE), among many other studies. While these analyses can provide invaluable information related to gene expression, integration and interpretation of the results can prove challenging. Here we present a tool called IRIS-EDA, which is a Shiny web server for expression data analysis. It provides a straightforward and user-friendly platform for performing numerous computational analyses on user-provided RNA-Seq or Single-cell RNA-Seq (scRNA-Seq) data. Specifically, three commonly used R packages (edgeR, DESeq2, and limma) are implemented in the DGE analysis with seven unique experimental design functionalities, including a user-specified design matrix option. Seven discovery-driven methods and tools (correlation analysis, heatmap, clustering, biclustering, Principal Component Analysis (PCA), Multidimensional Scaling (MDS), and t-distributed Stochastic Neighbor Embedding (t-SNE)) are provided for gene expression exploration which is useful for designing experimental hypotheses and determining key factors for comprehensive DGE analysis. Furthermore, this platform integrates seven visualization tools in a highly interactive manner, for improved interpretation of the analyses. It is noteworthy that, for the first time, IRIS-EDA provides a framework to expedite submission of data and results to NCBI's Gene Expression Omnibus following the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles. IRIS-EDA is freely available at http://bmbl.sdstate.edu/IRIS/
    corecore