50 research outputs found

    Identification and analysis of patterns in DNA sequences, the genetic code and transcriptional gene regulation

    Get PDF
    The present cumulative work consists of six articles linked by the topic ”Identification and Analysis of Patterns in DNA sequences, the Genetic Code and Transcriptional Gene Regulation”. We have applied a binary coding, to efficiently findpatterns within nucleotide sequences. In the first and second part of my work one single bit to encode all four nucleotides is used. The three possibilities of a one - bit coding are: keto (G,U) - amino (A,C) bases, strong (G,C) - weak (A,U) bases, and purines (G,A) - pyrimidines (C,U). We found out that the best pattern could be observed using the purine - pyrimidine coding. Applying this coding we have succeeded in finding a new representation of the genetic code which has been published under the title ”A New Classification Scheme of the Genetic Code” in ”Journal of Molecular Biology” and ”A Purine-Pyrimidine Classification Scheme of the Genetic Code” in ”BIOForum Europe”. This new representation enables to reduce the common table of the genetic code from 64 to 32 fields maintaining the same information content. It turned out that all known and even new patterns of the genetic code can easily be recognized in this new scheme. Furthermore, our new representation allows us for speculations about the origin and evolution of the translation machinery and the genetic code. Thus, we found a possible explanation for the contemporary codon - amino acid assignment and wide support for an early doublet code. Those explanations have been published in ”Journal of Bioinformatics and Computational Biology” under the title ”The New Classification Scheme of the Genetic Code, its Early Evolution, and tRNA Usage”. Assuming to find these purine - pyrimidine patterns at the DNA level itself, we examined DNA binding sites for the occurrence of binary patterns. A comprehensive statistic about the largest class of restriction enzymes (type II) has shown a very distinctive purine - pyrimidine pattern. Moreover, we have observed a higher G+C content for the protein binding sequences. For both observations we have provided and discussed several explanations published under the title ”Common Patterns in Type II Restriction Enzyme Binding Sites” in ”Nucleic Acid Research”. The identified patterns may help to understand how a protein finds its binding site. In the last part of my work two submitted articles about the analysis of Boolean functions are presented. Boolean functions are used for the description and analysis of complex dynamic processes and make it easier to find binary patterns within biochemical interaction networks. It is well known that not all functions are necessary to describe biologically relevant gene interaction networks. In the article entitled ”Boolean Networks with Biologically Relevant Rules Show Ordered Behavior”, submitted to ”BioSystems”, we have shown, that the class of required Boolean functions can strongly be restricted. Furthermore, we calculated the exact number of hierarchically canalizing functions which are known to be biologically relevant. In our work ”The Decomposition Tree for Analysis of Boolean Functions” submitted to ”Journal of Complexity”, we introduced an efficient data structure for the classification and analysis of Boolean functions. This permits the recognition of biologically relevant Boolean functions in polynomial time

    Robustness of Transcriptional Regulation in Yeast-like Model Boolean Networks

    Get PDF
    We investigate the dynamical properties of the transcriptional regulation of gene expression in the yeast Saccharomyces Cerevisiae within the framework of a synchronously and deterministically updated Boolean network model. By means of a dynamically determinant subnetwork, we explore the robustness of transcriptional regulation as a function of the type of Boolean functions used in the model that mimic the influence of regulating agents on the transcription level of a gene. We compare the results obtained for the actual yeast network with those from two different model networks, one with similar in-degree distribution as the yeast and random otherwise, and another due to Balcan et al., where the global topology of the yeast network is reproduced faithfully. We, surprisingly, find that the first set of model networks better reproduce the results found with the actual yeast network, even though the Balcan et al. model networks are structurally more similar to that of yeast.Comment: 7 pages, 4 figures, To appear in Int. J. Bifurcation and Chaos, typos were corrected and 2 references were adde

    Common patterns in type II restriction enzyme binding sites

    Get PDF
    Restriction enzymes are among the best studied examples of DNA binding proteins. In order to find general patterns in DNA recognition sites, which may reflect important properties of protein–DNA interaction, we analyse the binding sites of all known type II restriction endonucleases. We find a significantly enhanced GC content and discuss three explanations for this phenomenon. Moreover, we study patterns of nucleotide order in recognition sites. Our analysis reveals a striking accumulation of adjacent purines (R) or pyrimidines (Y). We discuss three possible reasons: RR/YY dinucleotides are characterized by (i) stronger H-bond donor and acceptor clusters, (ii) specific geometrical properties and (iii) a low stacking energy. These features make RR/YY steps particularly accessible for specific protein–DNA interactions. Finally, we show that the recognition sites of type II restriction enzymes are underrepresented in host genomes and in phage genomes

    DiProDB: a database for dinucleotide properties

    Get PDF
    DiProDB (http://diprodb.fli-leibniz.de) is a database of conformational and thermodynamic dinucleotide properties. It includes datasets both for DNA and RNA, as well as for single and double strands. The data have been shown to be important for understanding different aspects of nucleic acid structure and function, and they can also be used for encoding nucleic acid sequences. The database is intended to facilitate further applications of dinucleotide properties. A number of property datasets is highly correlated. Therefore, the database comes with a correlation analysis facility. Authors having determined new sets of dinucleotide property values are invited to submit these data to DiProDB

    BioBayesNet: a web server for feature extraction and Bayesian network modeling of biological sequence data

    Get PDF
    BioBayesNet is a new web application that allows the easy modeling and classification of biological data using Bayesian networks. To learn Bayesian networks the user can either upload a set of annotated FASTA sequences or a set of pre-computed feature vectors. In case of FASTA sequences, the server is able to generate a wide range of sequence and structural features from the sequences. These features are used to learn Bayesian networks. An automatic feature selection procedure assists in selecting discriminative features, providing an (locally) optimal set of features. The output includes several quality measures of the overall network and individual features as well as a graphical representation of the network structure, which allows to explore dependencies between features. Finally, the learned Bayesian network or another uploaded network can be used to classify new data. BioBayesNet facilitates the use of Bayesian networks in biological sequences analysis and is flexible to support modeling and classification applications in various scientific fields. The BioBayesNet server is available at http://biwww3.informatik.uni-freiburg.de:8080/BioBayesNet/

    TassDB: a database of alternative tandem splice sites

    Get PDF
    Subtle alternative splice events at tandem splice sites are frequent in eukaryotes and substantially increase the complexity of transcriptomes and proteomes. We have developed a relational database, TassDB (TAndem Splice Site DataBase), which stores extensive data about alternative splice events at GYNGYN donors and NAGNAG acceptors. These splice events are of subtle nature since they mostly result in the insertion/deletion of a single amino acid or the substitution of one amino acid by two others. Currently, TassDB contains 114 554 tandem splice sites of eight species, 5209 of which have EST/mRNA evidence for alternative splicing. In addition, human SNPs that affect NAGNAG acceptors are annotated. The database provides a user-friendly interface to search for specific genes or for genes containing tandem splice sites with specific features as well as the possibility to download large datasets. This database should facilitate further experimental studies and large-scale bioinformatics analyses of tandem splice sites. The database is available at

    Accurate prediction of NAGNAG alternative splicing

    Get PDF
    Alternative splicing (AS) involving NAGNAG tandem acceptors is an evolutionarily widespread class of AS. Recent predictions of alternative acceptor usage reported better results for acceptors separated by larger distances, than for NAGNAGs. To improve the latter, we aimed at the use of Bayesian networks (BN), and extensive experimental validation of the predictions. Using carefully constructed training and test datasets, a balanced sensitivity and specificity of ≥92% was achieved. A BN trained on the combined dataset was then used to make predictions, and 81% (38/47) of the experimentally tested predictions were verified. Using a BN learned on human data on six other genomes, we show that while the performance for the vertebrate genomes matches that achieved on human data, there is a slight drop for Drosophila and worm. Lastly, using the prediction accuracy according to experimental validation, we estimate the number of yet undiscovered alternative NAGNAGs. State of the art classifiers can produce highly accurate prediction of AS at NAGNAGs, indicating that we have identified the major features of the ‘NAGNAG-splicing code’ within the splice site and its immediate neighborhood. Our results suggest that the mechanism behind NAGNAG AS is simple, stochastic, and conserved among vertebrates and beyond

    Regulatory patterns in molecular interaction networks

    Full text link
    Understanding design principles of molecular interaction networks is an important goal of molecular systems biology. Some insights have been gained into features of their network topology through the discovery of graph theoretic patterns that constrain network dynamics. This paper contributes to the identification of patterns in the mechanisms that govern network dynamics. The control of nodes in gene regulatory, signaling, and metabolic networks is governed by a variety of biochemical mechanisms, with inputs from other network nodes that act additively or synergistically. This paper focuses on a certain type of logical rule that appears frequently as a regulatory pattern. Within the context of the multistate discrete model paradigm, a rule type is introduced that reduces to the concept of nested canalyzing function in the Boolean network case. It is shown that networks that employ this type of multivalued logic exhibit more robust dynamics than random networks, with few attractors and short limit cycles. It is also shown that the majority of regulatory functions in many published models of gene regulatory and signaling networks are nested canalyzing.Comment: gene regulation; signaling; mathematical model; nested canalyzing function; robustnes

    Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli

    Get PDF
    Recognition of genomic binding sites by transcription factors can occur through base-specific recognition, or by recognition of variations within the structure of the DNA macromolecule. In this article, we investigate what information can be retrieved from local DNA structural properties that is relevant to transcription factor binding and that cannot be captured by the nucleotide sequence alone. More specifically, we explore the benefit of employing the structural characteristics of DNA to create binding-site models that encompass indirect recognition for the Escherichia coli model organism. We developed a novel methodology [Conditional Random fields of Smoothed Structural Data (CRoSSeD)], based on structural scales and conditional random fields to model and predict regulator binding sites. The value of relying on local structural-DNA properties is demonstrated by improved classifier performance on a large number of biological datasets, and by the detection of novel binding sites which could be validated by independent data sources, and which could not be identified using sequence data alone. We further show that the CRoSSeD-binding-site models can be related to the actual molecular mechanisms of the transcription factor DNA binding, and thus cannot only be used for prediction of novel sites, but might also give valuable insights into unknown binding mechanisms of transcription factors

    Integrative inference of gene-regulatory networks in Escherichia coli using information theoretic concepts and sequence analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although <it>Escherichia coli </it>is one of the best studied model organisms, a comprehensive understanding of its gene regulation is not yet achieved. There exist many approaches to reconstruct regulatory interaction networks from gene expression experiments. Mutual information based approaches are most useful for large-scale network inference.</p> <p>Results</p> <p>We used a three-step approach in which we combined gene regulatory network inference based on directed information (DTI) and sequence analysis. DTI values were calculated on a set of gene expression profiles from 19 time course experiments extracted from the Many Microbes Microarray Database. Focusing on influences between pairs of genes in which one partner encodes a transcription factor (TF) we derived a network which contains 878 TF - gene interactions of which 166 are known according to RegulonDB. Afterward, we selected a subset of 109 interactions that could be confirmed by the presence of a phylogenetically conserved binding site of the respective regulator. By this second step, the fraction of known interactions increased from 19% to 60%. In the last step, we checked the 44 of the 109 interactions not yet included in RegulonDB for functional relationships between the regulator and the target and, thus, obtained ten TF - target gene interactions. Five of them concern the regulator LexA and have already been reported in the literature. The remaining five influences describe regulations by Fis (with two novel targets), PhdR, PhoP, and KdgR. For the validation of our approach, one of them, the regulation of lipoate synthase (LipA) by the pyruvate-sensing pyruvate dehydrogenate repressor (PdhR), was experimentally checked and confirmed.</p> <p>Conclusions</p> <p>We predicted a set of five novel TF - target gene interactions in <it>E. coli</it>. One of them, the regulation of <it>lipA </it>by the transcriptional regulator PdhR was validated experimentally. Furthermore, we developed DTInfer, a new R-package for the inference of gene-regulatory networks from microarrays using directed information.</p
    corecore