Skip to main content
Article thumbnail
Location of Repository

Sequence-Based Prediction of Type III Secreted Proteins

By Roland Arnold, Stefan Brandmaier, Frederick Kleine, Patrick Tischler, Eva Heinz, Sebastian Behrens, Antti Niinikoski, Hans-Werner Mewes, Matthias Horn and Thomas Rattei


The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 ( is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will facilitate further studies on and improve our understanding of type III secretion and its role in pathogen–host interactions

Topics: Research Article
Publisher: Public Library of Science
OAI identifier:
Provided by: PubMed Central
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://www.pubmedcentral.nih.g... (external link)
  • (external link)
  • Suggested articles


    1. (1998). A comparison of event models for naive Bayes tex. classification In:
    2. (2005). A directed screen for chlamydial proteins secreted by a type III mechanism identifies a translocated protein and numerous other new candidates.
    3. (2002). A functional screen for the type III (Hrp) secretome of the plant pathogen Pseudomonas syringae.
    4. (2005). A genomewide screen identifies a Bordetella type III secretion effector and candidate effectors in other species.
    5. (2006). An extensive repertoire of type III secretion effectors in Escherichia coli O157 and the role of lambdoid phages in their dissemination.
    6. (2002). Analyses of the evolutionary distribution of Salmonella translocated effectors.
    7. (1999). Bacterial invasion: force feeding by Salmonella.
    8. (2007). Bairoch A
    9. (2005). Bioinformatics correctly identifies many type III secretion substrates in the plant pathogen Pseudomonas syringae and the biocontrol isolate P. fluorescens SBW25.
    10. (2006). Bioinformatics-enabled identification of the HrpL regulon and type III secretion system effector proteins of Pseudomonas syringae pv. phaseolicola 1448A.
    11. (2008). Caspase-1 activation in macrophages infected with Yersinia pestis KIM requires the type III secretion system effector YopJ.
    12. (2007). Clustal W and Clustal X version 2.0.
    13. (2004). Collmer A
    14. (2006). Comparative genomics of host-specific virulence in Pseudomonas syringae.
    15. (2005). Core Team
    16. (1998). Correlation-based feature selection for machine learning.
    17. (2005). Data mining: practical machine learning tools and techniques.
    18. (2001). Distribution and structural variation of the she pathogenicity island in enteric bacterial pathogens.
    19. (2008). eggNOG: automated construction and annotation of orthologous groups of genes.
    20. (1995). Estimating continuous distributions in Bayesian classifiers.
    21. (2005). Evolutionary origins of genomic repertoires in bacteria.
    22. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
    23. (1998). Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis.
    24. (2002). Genomewide identification of proteins secreted by the Hrp type III protein secretion system of Pseudomonas syringae pv. tomato DC3000.
    25. (1981). Identification of common molecular subsequences.
    26. (2001). Improvements to Platt’s SMO algorithm for SVM classifier design.
    27. (2007). Improving the accuracy of transmembrane protein topology prediction using evolutionary information.
    28. (1991). Instance-based learning algorithms.
    29. (2008). JAligner: Open source Java implementation of Smith-Waterman.
    30. (2008). KEGG for linking genomes to life and the environment.
    31. (2005). Krogh A
    32. (1998). Large margin classification using the perceptron algorithm.
    33. (2002). Molecular and functional analysis of the type III secretion signal of the Salmonella enterica InvJ protein.
    34. (2006). Multiple approaches to a complete inventory of Pseudomonas syringae pv. tomato DC3000 type III secretion system effector proteins.
    35. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
    36. (2005). NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.
    37. (2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.
    38. (2007). New developments in the InterPro database.
    39. (2006). Pfam: clans, web tools and services.
    40. (2008). Piecing together the Type III injectisome of bacterial pathogens.
    41. (2000). Predicting protein function by genomic context: quantitative evaluation and qualitative inferences.
    42. (2006). PROMPT: a protein mapping and comparison tool.
    43. (2007). Protein secretion systems and adhesins: the molecular armory of Gram-negative pathogens.
    44. Pseudomonas syringae Genome Resources
    45. (1992). Ridge estimators in logistic regression.
    46. (2008). SIMAP— structuring the network of protein similarities.
    47. (2005). Sorg I, Cornelis GR
    48. (2007). STRING 7—recent developments in the integration and prediction of protein interactions.
    49. (2003). Substrate recognition by the Yersinia type III protein secretion machinery.
    50. (2003). Tackling the poor assumptions of naive Bayes text classifiers.
    51. (2006). Terminal reassortment drives the quantum evolution of type III effectors in bacterial pathogens.
    52. (2005). The bacterial injection kit: type III secretion systems.
    53. (2002). The N-terminus of enteropathogenic Escherichia coli (EPEC) Tir mediates transport across bacterial and eukaryotic cell membranes.
    54. (1999). The Salmonella invasin SipB induces macrophage apoptosis by binding to caspase-1. Proc Natl Acad Sci
    55. (2005). translocation of their cognate effectors and can substitute for each other in the secretion of HopO1-1.
    56. (2007). Type III secretion a ` la Chlamydia.
    57. (2005). Type III secretion of the Salmonella effector protein SopE is mediated via an NPrediction of Type III
    58. (2005). Type III secretion: a secretory pathway serving both motility and virulence (review).
    59. Yahr TL (2008) Control of gene expression by type III secretory activity.
    60. (1999). Yersinia enterocolitica type III secretion: an mRNA signal that couples translation and secretion of YopQ.
    61. (2002). Yersinia enterocolitica type III secretion: mutational analysis of the yopQ secretion signal.
    62. (2003). Yersinia yopQ mRNA encodes a bipartite type III secretion signal in the first 15 codons.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.