thesis

OperomeDB: database of condition specific transcription in prokaryotic genomes and genomic insights of convergent transcription in bacterial genomes

Abstract

Indiana University-Purdue University Indianapolis (IUPUI)My thesis comprises of two individual projects: 1) we have developed a database for operon prediction using high-throughput sequencing datasets for bacterial genomes. 2) Genomics and mechanistic insights of convergent transcription in bacterial genomes. In the first project we developed a database for the prediction of operons for bacterial genomes using RNA-seq datasets, we predicted operons for bacterial genomes. RNA-seq datasets with different condition for each bacterial genome were taken into account and predicted operons using Rockhopper. We took RNA-seq datasets from NCBI with distinct experimental conditions for each bacterial genome into account and analyzed using tool for operon prediction. Currently our database contains 9 bacterial organisms for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading and querying of data. In our database user can browse through reference genome, genes present in that genome and operons predicted from different RNA-seq datasets. Further in the second project, we studied the genomic and mechanistic insights of convergent transcription in bacterial genomes. We know that convergent gene pairs with overlapping head-to-head configuration are widely spread across both eukaryotic and prokaryotic genomes. They are believed to contribute to the regulation of genes at both transcriptional and post-transcriptional levels, although factors contributing to their abundance across genomes and mechanistic basis for their prevalence are poorly understood. In this study, we explore the role of various factors contributing to convergent overlapping transcription in bacterial genomes. Our analysis shows that the proportion of convergent overlapping gene pairs (COGPs) in a genome is affected due to endospore formation, bacterial habitat, oxygen requirement, GC content and the temperature range. In particular, we show that bacterial genomes thriving in specialized habitats, such as thermophiles, exhibit a high proportion of COGPs. Our results also conclude that the density distribution of COGPs across the genomes is high for shorter overlaps with increased conservation of distances for decreasing overlaps. Our study further reveals that COGPs frequently contain stop codon overlaps with the middle base position exhibiting mismatches between complementary strands. Further, for the functional analysis using cluster of orthologous groups (COGs) annotations suggested that cell motility, cell metabolism, storage and cell signaling are enriched among COGPs, suggesting their role in processes beyond regulation. Our analysis provides genomic insights into this unappreciated regulatory phenomenon, allowing a refined understanding of their contribution to bacterial phenotypes

    Similar works