2,400 research outputs found

    Leveraging EST Evidence to Automatically Predict Alternatively Spliced Genes, Master\u27s Thesis, December 2006

    Get PDF
    Current methods for high-throughput automatic annotation of newly sequenced genomes are largely limited to tools which predict only one transcript per gene locus. Evidence suggests that 20-50% of genes in higher eukariotic organisms are alternatively spliced. This leaves the remainder of the transcripts to be annotated by hand, an expensive time-consuming process. Genomes are being sequenced at a much higher rate than they can be annotated. We present three methods for using the alignments of inexpensive Expressed Sequence Tags in combination with HMM-based gene prediction with N-SCAN EST to recreate the vast majority of hand annotations in the D.melanogaster genome. In our first method, we “piece together” N-SCAN EST predictions with clustered EST alignments to increase the number of transcripts per locus predicted. This is shown to be a sensitve and accurate method, predicting the vast majority of known transcripts in the D.melanogaster genome. We present an approach of using these clusters of EST alignments to construct a Multi-Pass gene prediction phase, again, piecing it together with clusters of EST alignments. While time consuming, Multi-Pass gene prediction is very accurate and more sensitive than single-pass. Finally, we present a new Hidden Markov Model instance, which augments the current N-SCAN EST HMM, that predicts multiple splice forms in a single pass of prediction. This method is less time consuming, and performs nearly as well as the multi-pass approach

    The role of linker histone globular domains in chromatosome formation

    Get PDF

    Quantitative genome-wide studies of RNA metabolism in yeast

    Get PDF
    Gene expression and its regulation are fundamental processes in every living cell and organism. RNA molecules hereby play a central role by translating the genetic information into proteins, by regulating gene activity and by forming structural components. The kinetics of RNA metabolism differ widely between genes and conditions and play an important role for cellular processes, but how this is achieved remains poorly understood. Here, we used a novel experimental protocol that allows profiling of newly transcribed RNAs in conjunction with an advanced computational modeling pipeline to explore the kinetics of RNA metabolism and the underlying genetic determinants.In the first study, we investigated cell cycle regulated gene expression and the contributions of synthesis and degradation to mRNA levels in S.cerevisiae. During the cell cycle, the levels of hundreds of mRNAs change in a periodic manner, but how this is carried out by alterations in the rates of mRNA synthesis and degradation has not been studied systematically. We were able to derive mRNA synthesis and degradation rates every 5 minutes during the cell cycle, and thus provide for the first time a high-resolution time series of RNA metabolism during the cell cycle. A novel statistical model identified 479 genes that show periodic changes in mRNA synthesis and generally also periodic changes in their mRNA degradation rates. Peaks of mRNA degradation follow peaks of mRNA synthesis, resulting in sharp and high peaks of mRNA levels at defined times during the cell cycle. Whereas the timing of mRNA synthesis is set by upstream DNA motifs and their associated transcription factors (TFs), the synthesis rate of a periodically expressed gene is apparently set by its core promoter. In the second study, we developed metabolic labeling with RNA-Seq (4tU-Seq) and novel computational methods to gain further insights into the kinetics of RNA metabolism and its regulation. To decrypt the regulatory code of the genome, sequence elements must be defined that determine RNA turnover and thus gene expression. Here we attempt such decryption in an eukaryotic model organism, the fission yeast S. pombe. We first derived an improved genome annotation that redefines borders of 36% of expressed mRNAs and adds 487 non-coding RNAs (ncRNAs). We then combined RNA labeling in-vivo with mathematical modeling to obtain rates of RNA synthesis and degradation for 5,484 expressed RNAs and splicing rates for 4,958 introns. We identified functional sequence elements in DNA and RNA that control RNA metabolic rates, and quantified the contributions of individual nucleotides to RNA synthesis, splicing, and degradation. Our approach reveals distinct kinetics of mRNA and ncRNA metabolism, separates antisense regulation by transcription interference from RNA interference, and provides a general tool for studying the regulatory code of genomes

    Organization and evolution of information within eukaryotic genomes.

    Get PDF

    Use of wavelet-packet transforms to develop an engineering model for multifractal characterization of mutation dynamics in pathological and nonpathological gene sequences

    Get PDF
    This study uses dynamical analysis to examine in a quantitative fashion the information coding mechanism in DNA sequences. This exceeds the simple dichotomy of either modeling the mechanism by comparing DNA sequence walks as Fractal Brownian Motion (fbm) processes. The 2-D mappings of the DNA sequences for this research are from Iterated Function System (IFS) (Also known as the Chaos Game Representation (CGR)) mappings of the DNA sequences. This technique converts a 1-D sequence into a 2-D representation that preserves subsequence structure and provides a visual representation. The second step of this analysis involves the application of Wavelet Packet Transforms, a recently developed technique from the field of signal processing. A multi-fractal model is built by using wavelet transforms to estimate the Hurst exponent, H. The Hurst exponent is a non-parametric measurement of the dynamism of a system. This procedure is used to evaluate gene-coding events in the DNA sequence of cystic fibrosis mutations. The H exponent is calculated for various mutation sites in this gene. The results of this study indicate the presence of anti-persistent, random walks and persistent sub-periods in the sequence. This indicates the hypothesis of a multi-fractal model of DNA information encoding warrants further consideration.;This work examines the model\u27s behavior in both pathological (mutations) and non-pathological (healthy) base pair sequences of the cystic fibrosis gene. These mutations both natural and synthetic were introduced by computer manipulation of the original base pair text files. The results show that disease severity and system information dynamics correlate. These results have implications for genetic engineering as well as in mathematical biology. They suggest that there is scope for more multi-fractal models to be developed
    corecore