Skip to main content
Article thumbnail
Location of Repository

Pattern-Based Phylogenetic Distance Estimation and Tree Reconstruction

By Michael Höhl, Isidore Rigoutsos and Mark A. Ragan


We have developed an alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf + py at, we have created a data set of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees

Topics: Original Research
Publisher: Libertas Academica
OAI identifier:
Provided by: PubMed Central
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://www.pubmedcentral.nih.g... (external link)
  • (external link)
  • Suggested articles


    1. (1997). A measure of DNA sequence dissimilarity based on the Mahalanobis distance between frequencies of words.
    2. (1986). A measure of the similarity of sets of sequences not requiring sequence alignment.
    3. (2003). A new sequence distance measure for phylogenetic tree reconstruction. Bioinformatics,
    4. (2004). A probabilistic measure for alignment-free sequence comparison.
    5. (2003). Alignment-free sequence comparison—a review.
    6. (1992). Amino acid substitution matrices from protein blocks.
    7. (1989). Average values of a dissimilarity measure not requiring se quence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system.
    8. (1999). BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs.
    9. (1994). CLUSTAL W: improving the sensitiv ity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice.
    10. (2004). Comparative evaluation of word composition distances for the recognition of SCOP relationships.
    11. (1985). Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.
    12. (2005). Conservative extraction of over represented extensible motifs.
    13. (1999). DIALIGN 2: improvement of the segment-tosegment ap proach to multiple sequence alignment.
    14. (1967). Dis tributed by the author. (Department of Genome Sciences,
    15. (2005). Genome trees and the nature of genome evolution.
    16. (2005). Information theoretic approaches to whole genome phylogenies.
    17. (2004). Local homology recognition and distance measures in linear time using compressed amino acid alphabets.
    18. (2004). Metrics for comparing regulatory sequences on the basis of pattern counts.
    19. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
    20. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.).
    21. (1976). On the complexity of finite sequences.
    22. (2005). Optimal word sizes for dissimilarity mea sures and estimation of the degree of dissimilarity between DNA sequences.
    23. (2005). PHYLIP (phylogeny inference package) version 3.65.
    24. (2002). PhyloGen: phylogenetic tree simulator pack age.
    25. (2005). ProbCons: proba bilistic consistency-based multiple sequence alignment.
    26. (2004). Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance.
    27. (2004). Qdist—quartet distance between evolutionary trees.
    28. (2005). Scoredist: a simple and robust protein sequence distance estimator.
    29. (1997). Sequence-Generator: an application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees.
    30. (1998). Taxon sampling and the accuracy of large phylogenies.
    31. (1986). The classification of amino acid conservation.
    32. (1989). The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability.
    33. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees.
    34. (1992). The rapid generation of mutation data matrices from protein sequences.
    35. (1994). The reconstructed evolutionary process.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.