Skip to main content
Article thumbnail
Location of Repository

Modelling prokaryote gene content

By Matthew Spencer, Edward Susko and Andrew J. Roger


The patchy distribution of genes across the prokaryotes may be caused by multiple gene losses or lateral transfer. Probabilistic models of gene gain and loss are needed to distinguish between these possibilities. Existing models allow only single genes to be gained and lost, despite the empirical evidence for multi-gene events. We compare birth-death models (currently the only widely-used models, in which only one gene can be gained or lost at a time) to blocks models (allowing gain and loss of multiple genes within a family). We analyze two pairs of genomes: two E. coli strains, and the distantly-related Archaeoglobus fulgidus (archaea) and Bacillus subtilis (gram positive bacteria). Blocks models describe the data much better than birth-death models. Our models suggest that lateral transfers of multiple genes from the same family are rare (although transfers of single genes are probably common). For both pairs, the estimated median time that a gene will remain in the genome is not much greater than the time separating the common ancestors of the archaea and bacteria. Deep phylogenetic reconstruction from sequence data will therefore depend on choosing genes likely to remain in the genome for a long time. Phylogenies based on the blocks model are more biologically plausible than phylogenies based on the birth-death model

Topics: Original Research
Publisher: Libertas Academica
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (1997). A genomic perspective on protein families.
  2. (1999). A nonhyperthermophilic common ancestor to extant life forms.
  3. (2000). A simple evolutionary model for genome phylogeny based on gene content.
  4. (2003). Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes.
  5. (2000). Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Molecular Biology and Evolution,
  6. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions.
  7. (2003). Bayesian gene/ species tree reconciliation and orthology analysis using MCMC.
  8. (2002). Birth and death of protein domains: a simple model of evolution explains power law behavior.
  9. (1995). Calculating the probabilty distribution of ancestral states reconstructed by parsimony on phylogenetic trees.
  10. (2004). Cladogenesis, coalescence and the evolution of the three domains of life. Trends in Genetics,
  11. (2004). Comparative genomics of gene-family size in closely related bacteria. Genome Biology,
  12. (2001). Comparing genomes within the species Mycobacterium tuberculosis.
  13. (2004). Computational inference of scenarios for α-proteobacterial genome evolution.
  14. (2004). Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Molecular Biology and Evolution,
  15. (1994). Estimating the pattern of nucleotide substitution.
  16. (2005). Estimating the tempo and mode of gene family evolution from comparative genomic data.
  17. (2000). Evolutionary dynamics of full genome content in Escherichia coli.
  18. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach.
  19. (2000). Fundamentals of Molecular Evolution.
  20. (2004). Gene duplication and biased functional retention of paralogs in bacterial genomes.
  21. (2005). Gene family contentbased phylogeny of prokaryotes: the effect of criteria for inferring homology. Systematic Biology,
  22. (2004). Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models.
  23. (2004). Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution.
  24. (2004). Genome phylogenetic analysis based on extended gene contents. Molecular Biology and Evolution,
  25. (1999). Genome phylogeny based on gene content.
  26. (2001). Genome trees constructed using fi ve different approaches suggest new major bacterial clades.
  27. (2002). Heterotachy, an important process of protein evolution. Molecular Biology and Evolution,
  28. (2003). Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution.
  29. (2004). Inferring Phylogenies. Sinauer Associates,
  30. (2002). Inferring the root of a phylogenetic tree. Systematic Biology,
  31. (2003). Lateral gene transfer and the origins of prokaryotic groups. Annual Review of Genetics,
  32. (2003). Lateral gene transfer: when will adolescence end? Molecular Microbiology,
  33. (1984). Lengths of chromosomal segments conserved since divergence of man and mouse.
  34. (1997). Markov Chains.
  35. (2004). Maximum likelihood for genome phylogeny on gene content. Statistical applications in genetics and molecular biology, 3:article 31.
  36. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods.
  37. (2005). Measuring genome conservation across taxa: divided strains and united kingdoms.
  38. (1995). Neighbor joining and maximum likelihood with RNA sequences: addressing the interdependence of sites. Molecular Biology and Evolution,
  39. (1948). On the generalized “birth-and-death” process.
  40. (2002). Orthology, paralogy and proposed classifi cation for paralog subtypes. Trends in Genetics,
  41. (2004). Patterns of bacterial gene movement. Molecular Biology and Evolution,
  42. (2003). Paup*. phylogenetic analysis using parsimony (*and other methods).
  43. (2004). Phylogenetic trees based on gene content.
  44. (1997). Phylogeny estimation and hypothesis testing using maximum likelihood.
  45. (2003). Retroids in archaea: phylogeny and lateral origins. Molecular Biology and Evolution,
  46. (1979). Robust locally weighted regression and smoothing scatterplots.
  47. (2002). SHOT: a web server for the construction of genome phylogenies.
  48. (2003). Simple stochastic birth and death models of genome evolution: was there enough time for us to evolve? Bioinformatics,
  49. (2003). The balance of driving forces during genome evolution in prokaryotes.
  50. (2003). The COG database: an updated version includes eukaryotes.
  51. (1998). The frequency distribution of gene family sizes in complete genomes. Molecular Biology and Evolution,
  52. (2004). The ring of life provides evidence for a genome fusion origin of eukaryotes.
  53. Tsai SF.2003. Comparative genome analysis of Vibrio vulnifi cus, a marine pathogen.
  54. (2005). Weighted genome trees: refinements and applications.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.