3,490 research outputs found

    Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

    Get PDF
    Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

    Human Promoter Prediction Using DNA Numerical Representation

    Get PDF
    With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system

    Promoter prediction using physico-chemical properties of DNA

    Get PDF
    The ability to locate promoters within a section of DNA is known to be a very difficult and very important task in DNA analysis. We document an approach that incorporates the concept of DNA as a complex molecule using several models of its physico-chemical properties. A support vector machine is trained to recognise promoters by their distinctive physical and chemical properties. We demonstrate that by combining models, we can improve upon the classification accuracy obtained with a single model. We also show that by examining how the predictive accuracy of these properties varies over the promoter, we can reduce the number of attributes needed. Finally, we apply this method to a real-world problem. The results demonstrate that such an approach has significant merit in its own right. Furthermore, they suggest better results from a planned combined approach to promoter prediction using both physicochemical and sequence based techniques

    IN-AIS-MACA: Integrated Artificial Immune System based Multiple Attractor Cellular Automata For Human Protein Coding and Promoter Prediction of 252bp Length DNA Sequence

    Get PDF
    Gene prediction involves protein coding and promoter predictions. There is a need of integrated algorithms which can predict both these regions at a faster rate. Till date, we have individual algorithms for addressing these problems. We have developed a novel classifier IN-AIS-MACA, which can predict both these regions in genomic DNA sequences of length 252bp with 93.5% accuracy and total prediction time of 1031ms. This classifier will certainly create intuition to develop more classifiers like this

    A novel method for prokaryotic promoter prediction based on DNA stability

    Get PDF
    Background: In the post-genomic era, correct gene prediction has become one of the biggest challenges in genome annotation. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. This work presents a novel prokaryotic promoter prediction method based on DNA stability.Results: The promoter region is less stable and hence more prone to melting as compared to other genomic regions. Our analysis shows that a method of promoter prediction based on the differences in the stability of DNA sequences in the promoter and non-promoter region works much better compared to existing prokaryotic promoter prediction programs, which are based on sequence motif searches. At present the method works optimally for genomes such as that of Escherichia coli, which have near 50% G+C composition and also performs satisfactorily in case of other prokaryotic promoters.Conclusions: Our analysis clearly shows that the change in stability of DNA seems to provide a much better clue than usual sequence motifs, such as Pribnow box and -35 sequence, for differentiating promoter region from non-promoter regions. To a certain extent, it is more general and is likely to be applicable across organisms. Hence incorporation of such features in addition to the signature motifs can greatly improve the presently available promoter prediction programs

    Determining promoter location based on DNA structure first-principles calculations

    Get PDF
    A new method is presented which predicts promoter regions based on atomistic molecular dynamics simulations of small oligonucleotides, without requiring information on sequence conservation or features

    Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency

    Get PDF
    BACKGROUND: Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be extracted from genome on the basis of their relative position to the coding regions. RESULTS: To explore the feasibility of CRM extraction from a genome, we generated an original training set, containing annotated sequence data for most of the known developmental CRMs from Drosophila. Based on this set of experimental data, we developed a strategy for statistical extraction of cis-regulatory modules from the genome, using exhaustive analysis of local word frequency (LWF). To assess the performance of our analysis, we measured the correlation between predictions generated by the LWF algorithm and the distribution of conserved non-coding regions in a number of Drosophila developmental genes. CONCLUSIONS: In most of the cases tested, we observed high correlation (up to 0.6–0.8, measured on the entire gene locus) between the two independent techniques. We discuss computational strategies available for extraction of Drosophila CRMs and possible extensions of these methods

    In silico promoter recognition from deepCAGE data

    Get PDF
    The accurate identification of transcription start regions corresponding to the promoters of known genes, novel coding, and noncoding transcripts, as well as enhancer elements, is a crucial step towards a complete understanding of state-specific gene regulatory networks. Recent high-throughput techniques, such as deepCAGE or single-molecule CAGE, have made it possible to identify the genome-wide location, relative expression, and differential usage of transcription start regions across hundreds of different tissues and cell lines. Here, we describe in detail the necessary computational analysis of CAGE data, with focus on two recent in silico methodologies for CAGE peak/profile definition and promoter recognition, namely the Decomposition-based Peak Identification (DPI) and the PROmiRNA software. We apply both methodologies to the challenging task of identifying primary microRNAs transcript (pri-miRNA) start sites and compare the results
    corecore