3,637 research outputs found

    A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

    Get PDF
    Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

    Design and Functional Assembly of Synthetic Biological Parts and Devices

    No full text
    Programming living cells with synthetic gene circuits to perform desired tasks has been a major theme in the emerging field of synthetic biology. However, gene circuit engineering currently lacks the same predictability and reliability as seen in other mature engineering disciplines. This thesis focuses on the design and engineering of novel modular and orthogonal biological devices, and the predictable functional assembly of modular biological elements (BioParts) into customisable larger biological devices. The thesis introduces the design methodology for engineering modular and orthogonal biological devices. A set of modular biological devices with digital logic functions, including the AND, NOT and combinatorial NAND gates, were constructed and quantitatively characterised. In particular, a novel genetic AND gate was engineered in Escherichia coli by redesigning the natural HrpR/HrpS heteroregulation motif in the hrp system of Pseudomonas syringae. The AND gate is orthogonal to E. coli chassis, and employs the alternative σ54-dependent gene transcription to achieve tight transcriptional control. Results obtained show that context has a large impact on part and device behaviour, established through the systematic characterisation of a series of biological parts and devices in various biophysical and genetic contexts. A new, effective strategy is presented for the assembly of BioParts into functional customised systems using engineered ‘incontext’ characterised modules aided by modelling, which can significantly increase the predictability of circuit construction by characterising the component parts and modules in the same biophysical and genetic contexts as anticipated in their final systems. Finally, the thesis presents the design and construction of an application-oriented integrated system – the cell density-dependent microbe-based biosensor. The in vivo biosensor was programmed to be able to integrate its own cell density signal through an engineered cell-cell communication module and a second environmental signal through an environment-responsive promoter in the logic AND manner, with GFP as the output readout

    A new twist on PIFE: photoisomerisation-related fluorescence enhancement

    Get PDF
    PIFE was first used as an acronym for protein-induced fluorescence enhancement, which refers to the increase in fluorescence observed upon the interaction of a fluorophore, such as a cyanine, with a protein. This fluorescence enhancement is due to changes in the rate of cis/trans photoisomerisation. It is clear now that this mechanism is generally applicable to interactions with any biomolecule and, in this review, we propose that PIFE is thereby renamed according to its fundamental working principle as photoisomerisation-related fluorescence enhancement, keeping the PIFE acronym intact. We discuss the photochemistry of cyanine fluorophores, the mechanism of PIFE, its advantages and limitations, and recent approaches to turn PIFE into a quantitative assay. We provide an overview of its current applications to different biomolecules and discuss potential future uses, including the study of protein-protein interactions, protein-ligand interactions and conformational changes in biomolecules.Comment: No Comment

    Modelling gene expression in terms of DNA sequence

    Get PDF
    Understanding the gene regulatory networks that control gene expression remains one of the most of important questions in molecular biology. Much of gene expression is controlled through transcription initiation, whose regulation is ultimately encoded in the constellations of small sequence motifs in the DNA that are bound by transcription factors (TFs) in a sequence-specific manner. In this thesis, we addressed the task of understanding gene regulation on two levels. Firstly, we present a computational pipeline for inferring a set of gene regulatory elements in a given organism which includes identifying genes that encode DNA-binding domains (DBDs), mapping them to known binding motifs by leveraging similarity in DBDs between species, annotating promoter regions genome-wide, aligning promoters with orthologous regions from related genomes, and predicting genome-wide transcription factor binding sites (TFBSs). We demonstrated the use of our pipeline by applying it to zebrafish. Furthermore, we integrated these results into our previously developed Integrated System for Motif Activity Response Analysis (ISMARA) which models gene expression data in terms of predicted regulatory sites. Using ISMARA, we predicted known and novel key regulatory TFs in zebrafish using a number of RNA-seq datasets. Secondly, we zoom in at the scale of one single TF regulating a set of constitutive promoters in \textit{Escherichia coli}. We analyzed an artificially evolved set of synthetic promoter sequences which are selected for expression constitutive promoters regulated by σ70\sigma^{70} transcription factor. We looked closely into promoter sequences and TF binding dynamics and investigated the predictive power of TF binding affinity on gene expression

    A Systematic Review of the Application of Machine Learning in CpG island (CGI) Detection and Methylation Prediction

    Get PDF
    Background: CpG island (CGI) detection and methylation prediction play important roles in studying the complex mechanisms of CGIs involved in genome regulation. In recent years, machine learning (ML) has been gradually applied to CGI detection and CGI methylation prediction algorithms in order to improve the accuracy of traditional methods. However, there are a few systematic reviews on the application of ML in CGI detection and CGI methylation prediction. Therefore, this systematic review aims to provide an overview of the application of ML in CGI detection and methylation prediction. Method: The review was carried out using the PRISMA guideline. The search strategy was applied to articles published on PubMed from 2000 to July 10, 2022. Two independent researchers screened the articles based on the retrieval strategies and identified a total of 54 articles. After that, we developed quality assessment questions to assess study quality and obtained 46 articles that met the eligibility criteria. Based on these articles, we first summarized the applications of ML methods in CGI detection and methylation prediction, and then identified the strengths and limitations of these studies. Result and Discussion: Finally, we have discussed the challenges and future research directions. Conclusion: This systematic review will contribute to the selection of algorithms and the future development of more efficient algorithms for CGI detection and methylation prediction

    Tuning Promoter Strength through RNA Polymerase Binding Site Design in Escherichia coli

    Get PDF
    One of the paramount goals of synthetic biology is to have the ability to tune transcriptional networks to targeted levels of expression at will. As a step in that direction, we have constructed a set of 18 unique binding sites for E. coli RNA Polymerase (RNAP) σ^(70) holoenzyme, designed using a model of sequence-dependent binding energy combined with a thermodynamic model of transcription to produce a targeted level of gene expression. This promoter set allows us to determine the correspondence between the absolute numbers of mRNA molecules or protein products and the predicted promoter binding energies measured in K_(B)T energy units. These binding sites adhere on average to the predicted level of gene expression over orders of magnitude in constitutive gene expression, to within a factor of in both protein and mRNA copy number. With these promoters in hand, we then place them under the regulatory control of a bacterial repressor and show that again there is a strict correspondence between the measured and predicted levels of expression, demonstrating the transferability of the promoters to an alternate regulatory context. In particular, our thermodynamic model predicts the expression from our promoters under a range of repressor concentrations between several per cell up to over 100 per cell. After correcting the predicted polymerase binding strength using the data from the unregulated promoter, the thermodynamic model accurately predicts the expression for the simple repression strains to within 30%. Demonstration of modular promoter design, where parts of the circuit (such as RNAP/TF binding strength and transcription factor copy number) can be independently chosen from a stock list and combined to give a predictable result, has important implications as an engineering tool for use in synthetic biology

    Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

    Get PDF
    Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

    Computational and Experimental Approaches to Reveal the Effects of Single Nucleotide Polymorphisms with Respect to Disease Diagnostics

    Get PDF
    DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules
    corecore