1 research outputs found

    Computational analyses to identify and study cis-regulatory regions in eukaryotes

    Get PDF
    Transcription is a vital and complicated process that is the first step leading to gene expression in eukaryotes. The initiation of transcription is controlled by a variety of regulatory elements which are found in the promoter regions of genes. The computational study of these cis-regulatory regions, and their mechanisms, is of great interest for large-scale studies on gene regulation and expression. With the rapidly rising availability of genomes, accurate computational tools are needed to identify and annotate promoter regions from nucleotide sequences. In addition, genome-wide experimental projects to study the transcriptional landscape have produced comprehensive transcript datasets such as Cap Analysis of Gene Expression (CAGE) and full length cDNA (fl-cDNA). These datasets can also be mined to derive actionable knowledge on promoter regions. Previous efforts towards computationally identifying promoter regions are heavily biased, both in quality and quantity, towards mammalian and insect genomes. The few tools that identify promoter regions in plants are either trained on data from a different kingdom, or are over-simplistic without utilizing the advances in mammalian promoter region prediction. There is also an urgent need for tools that can mine pre-existing transcript datasets to derive hypotheses about the complex transcriptional landscapes of eukaryotes. In this thesis, I have designed two computational tools that can greatly aid studies on cis-regulatory regions of eukaryotes. In order to identify promoter regions from nucleotide sequences, I have designed the Promoter Prediction Extractor (ProPEr) tool. This machine learning-based tool is robust and powerful in identifying promoter regions from varying sizes of plant DNA sequences, and is of specific value for relatively less-studied or newly-sequenced species. To analyze and utilize previously produced datasets from public and private 5\u27 profiling studies, we have designed TSRchitect. TSRchitect is an accurate tool that utilizes transcript datasets such as CAGE and EST/fl-cDNA to identify promoter regions. TSRchitect is capable of identifying alternative or tissue-specific promoter usage, and shows great potential in comparative studies of regulatory regions across eukaryotes. ProPEr and TSRchitect can, by themselves or as part of a larger annotation framework, expand our knowledge about the promoter regions of both newly-sequenced and model eukaryotic species
    corecore