5,856 research outputs found

    Coexistence of different base periodicities in prokaryotic genomes as related to DNA curvature, supercoiling, and transcription

    Full text link
    We analyzed the periodic patterns in E. coli promoters and compared the distributions of the corresponding patterns in promoters and in the complete genome to elucidate their function. Except the three-base periodicity, coincident with that in the coding regions and growing stronger in the region downstream from the transcriptions start (TS), all other salient periodicities are peaked upstream of TS. We found that helical periodicities with the lengths about B-helix pitch ~10.2-10.5 bp and A-helix pitch ~10.8-11.1 bp coexist in the genomic sequences. We mapped the distributions of stretches with A-, B-, and Z- like DNA periodicities onto E.coli genome. All three periodicities tend to concentrate within non-coding regions when their intensity becomes stronger and prevail in the promoter sequences. The comparison with available experimental data indicates that promoters with the most pronounced periodicities may be related to the supercoiling-sensitive genes.Comment: 23 pages, 6 figures, 2 table

    Assessing the effects of data selection and representation on the development of reliable E. coli sigma 70 promoter region predictors

    Get PDF
    As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ70 promoter prediction methods.NPRP grant No. 4-1454-1-233 from the Qatar National Research Fund (a member of Qatar Foundation).Scopu

    Recognition of prokaryotic promoters based on a novel variable-window Z-curve method

    Get PDF
    Transcription is the first step in gene expression, and it is the step at which most of the regulation of expression occurs. Although sequenced prokaryotic genomes provide a wealth of information, transcriptional regulatory networks are still poorly understood using the available genomic information, largely because accurate prediction of promoters is difficult. To improve promoter recognition performance, a novel variable-window Z-curve method is developed to extract general features of prokaryotic promoters. The features are used for further classification by the partial least squares technique. To verify the prediction performance, the proposed method is applied to predict promoter fragments of two representative prokaryotic model organisms (Escherichia coli and Bacillus subtilis). Depending on the feature extraction and selection power of the proposed method, the promoter prediction accuracies are improved markedly over most existing approaches: for E. coli, the accuracies are 96.05% (σ70 promoters, coding negative samples), 90.44% (σ70 promoters, non-coding negative samples), 92.13% (known sigma-factor promoters, coding negative samples), 92.50% (known sigma-factor promoters, non-coding negative samples), respectively; for B. subtilis, the accuracies are 95.83% (known sigma-factor promoters, coding negative samples) and 99.09% (known sigma-factor promoters, non-coding negative samples). Additionally, being a linear technique, the computational simplicity of the proposed method makes it easy to run in a matter of minutes on ordinary personal computers or even laptops. More importantly, there is no need to optimize parameters, so it is very practical for predicting other species promoters without any prior knowledge or prior information of the statistical properties of the samples

    Rational design and rapid screening of antisense oligonucleotides for prokaryotic gene modulation

    Get PDF
    Antisense oligodeoxynucleotides (oligos) are widely used for functional studies of both prokaryotic and eukaryotic genes. However, the identification of effective target sites is a major issue in antisense applications. Here, we study a number of thermodynamic and structural parameters that may affect the potency of antisense inhibition. We develop a cell-free assay for rapid oligo screening. This assay is used for measuring the expression of Escherichia coli lacZ, the antisense target for experimental testing and validation. Based on a training set of 18 oligos, we found that structural accessibility predicted by local folding of the target mRNA is the most important predictor for antisense activity. This finding was further confirmed by a direct validation study. In this study, a set of 10 oligos was designed to target accessible sites, and another set of 10 oligos was selected to target inaccessible sites. Seven of the 10 oligos for accessible sites were found to be effective (>50% inhibition), but none of the oligos for inaccessible sites was effective. The difference in the antisense activity between the two sets of oligos was statistically significant. We also found that the predictability of antisense activity by target accessibility was greatly improved for oligos targeted to the regions upstream of the end of the active domain for β-galactosidase, the protein encoded by lacZ. The combination of the structure-based antisense design and extension of the lacZ assay to include gene fusions will be applicable to high-throughput gene functional screening, and to the identification of new drug targets in pathogenic microbes. Design tools are available through the Sfold Web server at

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Human Promoter Prediction Using DNA Numerical Representation

    Get PDF
    With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system

    Bioinformatics in New Generation Flavivirus Vaccines

    Get PDF
    Flavivirus infections are the most prevalent arthropod-borne infections world wide, often causing severe disease especially among children, the elderly, and the immunocompromised. In the absence of effective antiviral treatment, prevention through vaccination would greatly reduce morbidity and mortality associated with flavivirus infections. Despite the success of the empirically developed vaccines against yellow fever virus, Japanese encephalitis virus and tick-borne encephalitis virus, there is an increasing need for a more rational design and development of safe and effective vaccines. Several bioinformatic tools are available to support such rational vaccine design. In doing so, several parameters have to be taken into account, such as safety for the target population, overall immunogenicity of the candidate vaccine, and efficacy and longevity of the immune responses triggered. Examples of how bio-informatics is applied to assist in the rational design and improvements of vaccines, particularly flavivirus vaccines, are presented and discussed

    Molecular Mechanisms of Transcription through Single-Molecule Experiments

    Get PDF
    Transcription represents the first step in gene expression. It is therefore not surprising that transcription is a highly regulated process and its control is essential to understand the flow and processing of information required by the cell to maintain its homeostasis. During transcription, a DNA molecule is copied into RNA molecules that are then used to translate the genetic information into proteins; this logical pattern has been conserved throughout all three kingdoms of life, from Archaea to Eukarya, making it an essential and fundamental cellular process. Even though some viruses that encode their genome in an RNA molecule use it as a template to make mRNA, others synthesize an intermediate DNA molecule from the RNA, a process known as reverse transcription, from which regular transcription of viral genes can then proceed in the host cells

    Rules extraction from neural networks applied to the prediction and recognition of prokaryotic promoters

    Get PDF
    Promoters are DNA sequences located upstream of the gene region and play a central role in gene expression. Computational techniques show good accuracy in gene prediction but are less successful in predicting promoters, primarily because of the high number of false positives that reflect characteristics of the promoter sequences. Many machine learning methods have been used to address this issue. Neural Networks (NN) have been successfully used in this field because of their ability to recognize imprecise and incomplete patterns characteristic of promoter sequences. In this paper, NN was used to predict and recognize promoter sequences in two data sets: (i) one based on nucleotide sequence information and (ii) another based on stability sequence information. The accuracy was approximately 80% for simulation (i) and 68% for simulation (ii). In the rules extracted, biological consensus motifs were important parts of the NN learning process in both simulations
    corecore