Skip to main content
Article thumbnail
Location of Repository

A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis

By Abdoulaye Baniré Diallo, François-Joseph Lapointe and Vladimir Makarenkov


In this article we address the problem of phylogenetic inference from nucleic acid data containing missing bases. We introduce a new effective approach, called “Probabilistic estimation of missing values” (PEMV), allowing one to estimate unknown nucleotides prior to computing the evolutionary distances between them. We show that the new method improves the accuracy of phylogenetic inference compared to the existing methods “Ignoring Missing Sites” (IMS), “Proportional Distribution of Missing and Ambiguous Bases” (PDMAB) included in the PAUP software [26]. The proposed strategy for estimating missing nucleotides is based on probabilistic formulae developed in the framework of the Jukes-Cantor [10] and Kimura 2-parameter [11] models. The relative performances of the new method were assessed through simulations carried out with the SeqGen program [20], for data generation, and the Bio NJ method [7], for inferring phylogenies. We also compared the new method to the DNAML program [5] and “Matrix Representation using Parsimony” (MRP) [13], [19] considering an example of 66 eutherian mammals originally analyzed in [17]

Topics: Original Research
Publisher: Libertas Academica
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (1980). A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide seq.
  2. (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates.
  3. (1992). A.: Phylogenetic inference based on matrix representation of trees.
  4. (1994). A.: Recovering a tree from the leaf colorations it generates under a Markov model.
  5. (1999). An algorithm for the fi tting of a phylogenetic tree according to a weighted least-squares criterion.
  6. (1997). An alternating least squares approach to inferring phylogenies from pairwise distances.
  7. (1997). An improved version of NJ algorithm based on a simple model of sequence Data.
  8. (1998). C.:Phylogenetic supertrees: Assembing the tree of life.
  9. (2004). Clann: Investigating phylogenetic information through supertree analyses.
  10. (1992). Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees.
  11. (1990). D.W.: Molecular phylogeny of Dictyostelium discoideum by proteinsequence comparison.
  12. (2003). Does adding characters with missing data increase or decrease phylogenetic accuracy.
  13. (2002). Effi cient biased estimation of evolutionary distances when substitution rates vary across sites.
  14. (1984). Estimation of evolutionary distance between nucleotide sequences.
  15. (2004). F-J.: A weighted least-squares approach for inferring phylogenies from incomplete distance matrices.
  16. (1984). for inferring phyl.: a justifi cation.
  17. (1981). Foulds L.: Comparison of phylogenetic trees.
  18. (1996). G.A.: A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol.
  19. (2005). Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree.
  20. (1969). Mammalian Protein Metabolism, chapter Evolution of protein molecules,
  21. (1998). Missing data, incomplete taxa, and phylogenetic accuracy.
  22. (2001). Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates,
  23. (2004). Phylogenomics of Eukaryotes: Impact of missing data on large alignments.
  24. (2001). S.J.: Molecular phylogenetics and the origins of placental mammals.
  25. (1997). SeqGen: An application for the Monte Carlo simulation of DNA sequences evolution along phylogenetic trees.
  26. (2001). T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks.
  27. (1997). The Clustal X windows interface: fl exible strategies for multiple sequence alignment aided by quality analysis tools.
  28. (2004). The evolution of supertrees. Trends in Ecol. and Evol.
  29. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees.
  30. (1991). When are fossils better than existent taxa in phylogenetic analysis?

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.