2,173 research outputs found

    Ribosome signatures aid bacterial translation initiation site identification

    Get PDF
    Background: While methods for annotation of genes are increasingly reliable, the exact identification of translation initiation sites remains a challenging problem. Since the N-termini of proteins often contain regulatory and targeting information, developing a robust method for start site identification is crucial. Ribosome profiling reads show distinct patterns of read length distributions around translation initiation sites. These patterns are typically lost in standard ribosome profiling analysis pipelines, when reads from footprints are adjusted to determine the specific codon being translated. Results: Utilising these signatures in combination with nucleotide sequence information, we build a model capable of predicting translation initiation sites and demonstrate its high accuracy using N-terminal proteomics. Applying this to prokaryotic translatomes, we re-annotate translation initiation sites and provide evidence of N-terminal truncations and extensions of previously annotated coding sequences. These re-annotations are supported by the presence of structural and sequence-based features next to N-terminal peptide evidence. Finally, our model identifies 61 novel genes previously undiscovered in the Salmonella enterica genome. Conclusions: Signatures within ribosome profiling read length distributions can be used in combination with nucleotide sequence information to provide accurate genome-wide identification of translation initiation sites

    Human Promoter Prediction Using DNA Numerical Representation

    Get PDF
    With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system

    Finding Single Copy Genes Out of Sequenced Genomes for Multilocus Phylogenetics in Non-Model Fungi

    Get PDF
    Historically, fungal multigene phylogenies have been reconstructed based on a small number of commonly used genes. The availability of complete fungal genomes has given rise to a new wave of model organisms that provide large number of genes potentially useful for building robust gene genealogies. Unfortunately, cross-utilization of these resources to study phylogenetic relationships in the vast majority of non-model fungi (i.e. “orphan” species) remains an unexamined question. To address this problem, we developed a method coupled with a program named “PHYLORPH” (PHYLogenetic markers for ORPHans). The method screens fungal genomic databases (107 fungal genomes fully sequenced) for single copy genes that might be easily transferable and well suited for studies at low taxonomic levels (for example, in species complexes) in non-model fungal species. To maximize the chance to target genes with informative regions, PHYLORPH displays a graphical evaluation system based on the estimation of nucleotide divergence relative to substitution type. The usefulness of this approach was tested by developing markers in four non-model groups of fungal pathogens. For each pathogen considered, 7 to 40% of the 10–15 best candidate genes proposed by PHYLORPH yielded sequencing success. Levels of polymorphism of these genes were compared with those obtained for some genes traditionally used to build fungal phylogenies (e.g. nuclear rDNA, β-tubulin, γ-actin, Elongation factor EF-1α). These genes were ranked among the best-performing ones and resolved accurately taxa relationships in each of the four non-model groups of fungi considered. We envision that PHYLORPH will constitute a useful tool for obtaining new and accurate phylogenetic markers to resolve relationships between closely related non-model fungal species

    Rapidly evolving protointrons in Saccharomyces genomes revealed by a hungry spliceosome.

    Get PDF
    Introns are a prevalent feature of eukaryotic genomes, yet their origins and contributions to genome function and evolution remain mysterious. In budding yeast, repression of the highly transcribed intron-containing ribosomal protein genes (RPGs) globally increases splicing of non-RPG transcripts through reduced competition for the spliceosome. We show that under these "hungry spliceosome" conditions, splicing occurs at more than 150 previously unannotated locations we call protointrons that do not overlap known introns. Protointrons use a less constrained set of splice sites and branchpoints than standard introns, including in one case AT-AC in place of GT-AG. Protointrons are not conserved in all closely related species, suggesting that most are not under positive selection and are fated to disappear. Some are found in non-coding RNAs (e. g. CUTs and SUTs), where they may contribute to the creation of new genes. Others are found across boundaries between noncoding and coding sequences, or within coding sequences, where they offer pathways to the creation of new protein variants, or new regulatory controls for existing genes. We define protointrons as (1) nonconserved intron-like sequences that are (2) infrequently spliced, and importantly (3) are not currently understood to contribute to gene expression or regulation in the way that standard introns function. A very few protointrons in S. cerevisiae challenge this classification by their increased splicing frequency and potential function, consistent with the proposed evolutionary process of "intronization", whereby new standard introns are created. This snapshot of intron evolution highlights the important role of the spliceosome in the expansion of transcribed genomic sequence space, providing a pathway for the rare events that may lead to the birth of new eukaryotic genes and the refinement of existing gene function

    Chromatin Dynamics Regulate Transcriptional Homeostasis

    Get PDF
    Eukaryotic promoters are inherently bidirectional and allow RNA Polymerase II to transcribe both coding and noncoding RNAs. Dynamic disassembly and reassembly is a prominent feature of nucleosomes around eukaryotic promoters. While H3K56 acetylation (H3K56Ac) enhances turnover events of these promoter-proximal nucleosomes, the chromatin remodeler INO80C ensures their proper positioning. In my dissertation, I explore how chromatin dynamics regulate transcriptional homeostasis. In the first part, I investigate the role of H3K56Ac on the nascent transcriptome throughout the eukaryotic cell cycle. I find that H3K56Ac is a global, positive regulator for coding and noncoding transcription by promoting both initiation and elongation/termination. On the contrary, I find that H3K56Ac represses promiscuous transcription following replication fork passage by ensuring efficient nucleosome assembly during S-phase. In addition, I show that there is a stepwise increase in transcription in the S-G2 transition, and this response to gene dosage imbalance does not require H3K56Ac. This study clearly shows that a single histone modification, H3K56Ac can exert both positive and negative effects on transcription at different cell cycle stages. In the second part, I investigate the role of the chromatin remodeler INO80C on the nascent transcription around replication origins. I show that INO80C, together with the transcription factor Mot1, prevents cryptic transcription around yeast replication origins, and the loss of these proteins lead to an increase in DNA double strand breaks. I hypothesize that recruitment of INO80C ensures proper positioning of nucleosomes around origins and the exclusion of RNA Pol II to prevent cryptic initiation. Together these findings indicate that H3K56Ac regulates transcription globally by enhancing nucleosome turnover, and it prevents cryptic transcription and reinforces transcriptional fidelity by promoting efficient nucleosome assembly in the S-phase. In addition, INO80C maintains genome stability by preventing cryptic transcription around the origins

    Genome-wide analysis reveals extensive functional interaction between DNA replication initiation and transcription in the genome of trypanosoma brucei

    Get PDF
    Identification of replication initiation sites, termed origins, is a crucial step in understanding genome transmission in any organism. Transcription of the Trypanosoma brucei genome is highly unusual, with each chromosome comprising a few discrete transcription units. To understand how DNA replication occurs in the context of such organization, we have performed genome-wide mapping of the binding sites of the replication initiator ORC1/CDC6 and have identified replication origins, revealing that both localize to the boundaries of the transcription units. A remarkably small number of active origins is seen, whose spacing is greater than in any other eukaryote. We show that replication and transcription in T. brucei have a profound functional overlap, as reducing ORC1/CDC6 levels leads to genome-wide increases in mRNA levels arising from the boundaries of the transcription units. In addition, ORC1/CDC6 loss causes derepression of silent Variant Surface Glycoprotein genes, which are critical for host immune evasion

    A knowledge engineering approach to the recognition of genomic coding regions

    Get PDF
    ได้ทุนอุดหนุนการวิจัยจากมหาวิทยาลัยเทคโนโลยีสุรนารี ปีงบประมาณ พ.ศ.2556-255

    A comparison study on feature selection of DNA structural properties for promoter prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Promoter prediction is an integrant step for understanding gene regulation and annotating genomes. Traditional promoter analysis is mainly based on sequence compositional features. Recently, many kinds of structural features have been employed in promoter prediction. However, considering the high-dimensionality and overfitting problems, it is unfeasible to utilize all available features for promoter prediction. Thus it is necessary to choose some appropriate features for the prediction task.</p> <p>Results</p> <p>This paper conducts an extensive comparison study on feature selection of DNA structural properties for promoter prediction. Firstly, to examine whether promoters possess some special structures, we carry out a systematical comparison among the profiles of thirteen structural features on promoter and non-promoter sequences. Secondly, we investigate the correlations between these structural features and promoter sequences. Thirdly, both filter and wrapper methods are utilized to select appropriate feature subsets from thirteen different kinds of structural features for promoter prediction, and the predictive power of the selected feature subsets is evaluated. Finally, we compare the prediction performance of the feature subsets selected in this paper with nine existing promoter prediction approaches.</p> <p>Conclusions</p> <p>Experimental results show that the structural features are differentially correlated to promoters. Specifically, DNA-bending stiffness, DNA denaturation and energy-related features are highly correlated with promoters. The predictive power for promoter sequences differentiates greatly among different structural features. Selecting the relevant features can significantly improve the accuracy of promoter prediction.</p
    corecore