6,155 research outputs found

    Computational prediction of type III secreted proteins from gram-negative bacteria

    Get PDF
    Abstract Background Type III secretion system (T3SS) is a specialized protein delivery system in gram-negative bacteria that injects proteins (called effectors) directly into the eukaryotic host cytosol and facilitates bacterial infection. For many plant and animal pathogens, T3SS is indispensable for disease development. Recently, T3SS has also been found in rhizobia and plays a crucial role in the nodulation process. Although a great deal of efforts have been done to understand type III secretion, the precise mechanism underlying the secretion and translocation process has not been fully understood. In particular, defined secretion and translocation signals enabling the secretion have not been identified from the type III secreted effectors (T3SEs), which makes the identification of these important virulence factors notoriously challenging. The availability of a large number of sequenced genomes for plant and animal-associated bacteria demands the development of efficient and effective prediction methods for the identification of T3SEs using bioinformatics approaches. Results We have developed a machine learning method based on the N-terminal amino acid sequences to predict novel type III effectors in the plant pathogen Pseudomonas syringae and the microsymbiont rhizobia. The extracted features used in the learning model (or classifier) include amino acid composition, secondary structure and solvent accessibility information. The method achieved a precision of over 90% on P. syringae in a cross validation study. In combination with a promoter screen for the type III specific promoters, this classifier trained on the P. syringae data was applied to predict novel T3SEs from the genomic sequences of four rhizobial strains. This application resulted in 57 candidate type III secreted proteins, 17 of which are confirmed effectors. Conclusion Our experimental results demonstrate that the machine learning method based on N-terminal amino acid sequences combined with a promoter screen could prove to be a very effective computational approach for predicting novel type III effectors in gram-negative bacteria. Our method and data are available to the public upon request

    Development of novel orthogonal genetic circuits, based on extracytoplasmic function (ECF) σ factors

    Get PDF
    The synthetic biology field aims to apply the engineering 'design-build-test-learn' cycle for the implementation of synthetic genetic circuits modifying the behavior of biological systems. In order to reach this goal, synthetic biology projects use a set of fully characterized biological parts that subsequently are assembled into complex synthetic circuits following a rational, model-driven design. However, even though the bottom-up design approach represents an optimal starting point to assay the behavior of the synthetic circuits under defined conditions, the rational design of such circuits is often restricted by the limited number of available DNA building blocks. These usually consist only of a handful of transcriptional regulators that additionally are often borrowed from natural biological systems. This, in turn, can lead to cross-reactions between the synthetic circuit and the host cell and eventually to loss of the original circuit function. Thus, one of the challenges in synthetic biology is to design synthetic circuits that perform the designated functions with minor cross-reactions (orthogonality). To overcome the restrictions of the widely used transcriptional regulators, this project aims to apply extracytoplasmic function (ECF) σ factors in the design novel orthogonal synthetic circuits. ECFs are the smallest and simplest alternative σ factors that recognize highly specific promoters. ECFs represent one of the most important mechanisms of signal transduction in bacteria, indeed, their activity is often controlled by anti-σ factors. Even though it was shown that the overexpression of heterologous anti-σ factors can generate an adverse effect on cell growth, they represent an attractive solution to control ECF activity. Finally, to date, we know thousands of ECF σ factors, widespread among different bacterial phyla, that are identifiable together with the cognate promoters and anti-σ factors, using bioinformatic approaches. All the above-mentioned features make ECF σ factors optimal candidates as core orthogonal regulators for the design of novel synthetic circuits. In this project, in order to establish ECF σ factors as standard building blocks in the synthetic biology field, we first established a high throughput experimental setup. This relies on microplate reader experiments performed using a highly sensitive luminescent reporter system. Luminescent reporters have a superior signal-to-noise ratio when compared to fluorescent reporters since they do not suffer from the high auto-fluorescence background of the bacterial cell. However, they also have a drawback represented by the constant light emission that can generate undesired cross-talk between neighboring wells on a microplate. To overcome this limitation, we developed a computational algorithm that corrects for luminescence bleed-through and estimates the “true” luminescence activity for each well of a microplate. We show that the correcting algorithm preserves low-level signals close to the background and that it is universally applicable to different experimental conditions. In order to simplify the assembly of large ECF-based synthetic circuits, we designed an ECF toolbox in E. coli. The toolbox allows for the combinatorial assembly of circuits into expression vectors, using a library of reusable genetic parts. Moreover, it also offers the possibility of integrating the newly generated synthetic circuits into four different phage attachment (att) sites present in the genome of E. coli. This allows for a flawless transition between plasmid-encoded and chromosomally integrated genetic circuits, expanding the possible genetic configurations of a given synthetic construct. Moreover, our results demonstrate that the four att sites are orthogonal in terms of the gene expression levels of the synthetic circuits. With the purpose of rationally design ECF-based synthetic circuits and taking advantage of the ECF toolbox, we characterized the dynamic behavior of a set of 15 ECF σ factors, their cognate promoters, and relative anti-σs. Overall, we found that ECFs are non-toxic and functional and that they display different binding affinities for the cognate target promoters. Moreover, our results show that it is possible to optimize the output dynamic range of the ECF-based switches by changing the copy number of the ECFs and target promoters, thus, tuning the input/output signal ratio. Next, by combining up to three ECF-switches, we generated a set of “genetic-timer circuits”, the first synthetic circuits harboring more than one ECF. ECF-based timer circuits sequentially activate a series of target genes with increasing time delays, moreover, the behavior of the circuits can be predicted by a set of mathematical models. In order to improve the dynamic response of the ECF-based constructs, we introduced anti-σ factors in our synthetic circuits. By doing so we first confirmed that anti-σ factors can exert an adverse effect on the growth of E. coli, thus we explored possible solutions. Our results demonstrate that anti-σ factors toxicity can be partially alleviated by generating truncated, soluble variants of the anti-σ factors and, eventually, completely abolished via chromosomal integration of the anti-σ factor-based circuits. Finally, after demonstrating that anti-σ factors can be used to generate a tunable time delay among ECF expression and target promoter activation, we designed ECF/AS-suicide circuits. Such circuits allow for the time-delayed cell-death of E. coli and will serve as a prototype for the further development of ECF/AS-based lysis circuits

    An iterative strategy combining biophysical criteria and duration hidden Markov) models for structural predictions of Chlamydia trachomatis s66 promoters

    Get PDF
    Background: Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase σ-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from Escherichia coli. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between Escherichia coli and Chlamydia trachomatis are large enough to recommend an organism-specific modeling effort. Results: Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model Chlamydia trachomatis σ66 promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for Chlamydia trachomatis RNA polymerase σ66/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability. Conclusion: This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase σ-factor/DNA binding collaboratively, contribute to a sequence\u27s ability to promote transcription. This work provides a baseline model that can evolve as new Chlamydia trachomatis σ66 promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes

    An iterative strategy combining biophysical criteria and duration hidden Markov models for structural predictions of Chlamydia trachomatis σ66 promoters

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase σ-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors. In addition, most promoter models have been trained on data from <it>Escherichia coli</it>. Although it has been shown that transcriptional mechanisms are similar among various bacteria, it is quite possible that the differences between <it>Escherichia coli </it>and <it>Chlamydia trachomatis </it>are large enough to recommend an organism-specific modeling effort.</p> <p>Results</p> <p>Here we present an iterative stochastic model building procedure that combines such biophysical metrics as DNA stability, curvature, twist and stress-induced DNA duplex destabilization along with duration hidden Markov model parameters to model <it>Chlamydia trachomatis </it>σ<sup>66 </sup>promoters from 29 experimentally verified sequences. Initially, iterative duration hidden Markov modeling of the training set sequences provides a scoring algorithm for <it>Chlamydia trachomatis </it>RNA polymerase σ<sup>66</sup>/DNA binding. Subsequently, an iterative application of Stepwise Binary Logistic Regression selects multiple promoter predictors and deletes/replaces training set sequences to determine an optimal training set. The resulting model predicts the final training set with a high degree of accuracy and provides insights into the structure of the promoter region. Model based genome-wide predictions are provided so that optimal promoter candidates can be experimentally evaluated, and refined models developed. Co-predictions with three other algorithms are also supplied to enhance reliability.</p> <p>Conclusion</p> <p>This strategy and resulting model support the conjecture that DNA biophysical properties, along with RNA polymerase σ-factor/DNA binding collaboratively, contribute to a sequence's ability to promote transcription. This work provides a baseline model that can evolve as new <it>Chlamydia trachomatis </it>σ<sup>66 </sup>promoters are identified with assistance from the provided genome-wide predictions. The proposed methodology is ideal for organisms with few identified promoters and relatively small genomes.</p

    Detection of prokaryotic promoters from the genomic distribution of hexanucleotide pairs

    Get PDF
    BACKGROUND: In bacteria, sigma factors and other transcriptional regulatory proteins recognize DNA patterns upstream of their target genes and interact with RNA polymerase to control transcription. As a consequence of evolution, DNA sequences recognized by transcription factors are thought to be enriched in intergenic regions (IRs) and depleted from coding regions of prokaryotic genomes. RESULTS: In this work, we report that genomic distribution of transcription factors binding sites is biased towards IRs, and that this bias is conserved amongst bacterial species. We further take advantage of this observation to develop an algorithm that can efficiently identify promoter boxes by a distribution-dependent approach rather than a direct sequence comparison approach. This strategy, which can easily be combined with other methodologies, allowed the identification of promoter sequences in ten species and can be used with any annotated bacterial genome, with results that rival with current methodologies. Experimental validations of predicted promoters also support our approach. CONCLUSION: Considering that complete genomic sequences of over 1000 bacteria will soon be available and that little transcriptional information is available for most of them, our algorithm constitutes a promising tool for the prediction of promoter sequences. Importantly, our methodology could also be adapted to identify DNA sequences recognized by other regulatory proteins
    corecore