46 research outputs found

    Fast neighbor joining

    Get PDF
    AbstractReconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between n taxa and produces in Θ(n3) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius.The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity O(n2) and (2) we present a greatly simplified proof for the correctness of NJ. Initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for computing the so-called correction formulas

    Why neighbor-joining works

    Get PDF
    We show that the neighbor-joining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson's optimal radius bound as a special case and explains many cases where neighbor-joining is successful even when Atteson's criterion is not satisfied. We also provide a proof for Atteson's conjecture on the optimal edge radius of the neighbor-joining algorithm. The strong performance guarantees we provide also hold for the quadratic time fast neighbor-joining algorithm, thus providing a theoretical basis for inferring very large phylogenies with neighbor-joining

    Live neighbor-joining

    Get PDF
    Background: In phylogenetic reconstruction the result is a tree where all taxa are leaves and internal nodes are hypothetical ancestors. In a live phylogeny, both ancestral and living taxa may coexist, leading to a tree where internal nodes may be living taxa. The well-known Neighbor-Joining heuristic is largely used for phylogenetic reconstruction. Results: We present Live Neighbor-Joining, a heuristic for building a live phylogeny. We have investigated Live Neighbor-Joining on datasets of viral genomes, a plausible scenario for its application, which allowed the construction of alternative hypothesis for the relationships among virus that embrace both ancestral and descending taxa. We also applied Live Neighbor-Joining on a set of bacterial genomes and to sets of images and texts. Non-biological data may be better explored visually when their relationship in terms of content similarity is represented by means of a phylogeny. Conclusion: Our experiments have shown interesting alternative phylogenetic hypothesis for RNA virus genomes, bacterial genomes and alternative relationships among images and texts, illustrating a wide range of scenarios where Live Neighbor-Joining may be used

    XplorSeq: A software environment for integrated management and phylogenetic analysis of metagenomic sequence data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects.</p> <p>Results</p> <p>XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file.</p> <p>Conclusion</p> <p>XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at <url>http://vent.colorado.edu/phyloware</url>.</p

    Neighbor Joining And Leaf Status

    Full text link
    The Neighbor Joining Algorithm is among the most fundamental algorithmic results in computational biology. However, its definition and correctness proof are not straightforward. In particular, ''the question ''what does the NJ method seek to do?'' has until recently proved somewhat elusive'' [Gascuel \& Steel, 2006]. While a rigorous mathematical analysis is now available, it is still considered somewhat hard to follow and its proof tedious at best. In this work, we present an alternative interpretation of the goal of the Neighbor Joining algorithm by proving that it chooses to merge the two taxa u and v that maximize the ''leaf-status'', that is, the sum of distances of all leaves to the unique u-v-path

    Phylogenomic identification of five new human homologs of the DNA repair enzyme AlkB

    Get PDF
    BACKGROUND: Combination of biochemical and bioinformatic analyses led to the discovery of oxidative demethylation – a novel DNA repair mechanism catalyzed by the Escherichia coli AlkB protein and its two human homologs, hABH2 and hABH3. This discovery was based on the prediction made by Aravind and Koonin that AlkB is a member of the 2OG-Fe(2+ )oxygenase superfamily. RESULTS: In this article, we report identification and sequence analysis of five human members of the (2OG-Fe(2+)) oxygenase superfamily designated here as hABH4 through hABH8. These experimentally uncharacterized and poorly annotated genes were not associated with the AlkB family in any database, but are predicted here to be phylogenetically and functionally related to the AlkB family (and specifically to the lineage that groups together hABH2 and hABH3) rather than to any other oxygenase family. Our analysis reveals the history of ABH gene duplications in the evolution of vertebrate genomes. CONCLUSIONS: We hypothesize that hABH 4–8 could either be back-up enzymes for hABH1-3 or may code for novel DNA or RNA repair activities. For example, enzymes that can dealkylate N3-methylpurines or N7-methylpurines in DNA have not been described. Our analysis will guide experimental confirmation of these novel human putative DNA repair enzymes

    ISOLASI KHAMIR DARI BATANG TANAMAN TEBU DAN IDENTIFIKASINYA BERDASARKAN SEKUENS INTERNAL TRANSCRIBED SPACER

    Get PDF
    Isolation of Yeasts from Sugarcane Stems and Their Identification Based on Internal Transcribed Spacer Sequences ABSTRACTFermentative yeasts used in food, health, and energy industries need to be explored to discover their potential. The purpose of this study was to obtain fermentative yeast isolates from sugarcane stems and subsequently to undertake morphological, biochemical, and molecular identification. The isolation of epiphytic and endophytic yeasts was carried out by spread plate method using sugarcane soak water and sugarcane juice on potato dextrose agar (PDA) and yeast-glucose-peptone (YGP) agar media. Morphological identification was based on macroscopic and microscopic observations. Biochemical identification was performed using carbohydrate fermentation and 50%-glucose media tests. Selected isolates were identified molecularly using Internal Transcribed Spacer (ITS). Seven yeast isolates were obtained, of which isolate Ed 1B was selected. Isolate ED 1B was of round colonies, creamy white colour, shiny, embossed, and wavy appearance, ovoid cell shape with a cell diameter of 4.74 µm. It had budding cells, was able to ferment glucose and sucrose (but not lactose), and grew on 50 %-glucose media. Results of BLAST showed that isolates Ed 1B had 99% homology with Kodamaea ohmeri.Keywords: isolation, ITS, molecular identification, Saccharum officinarum L., yeast ABSTRAKKhamir fermentatif yang digunakan dalam industri pangan, kesehatan dan energi perlu dieksplorasi untuk mengetahui potensinya. Tujuan penelitian ini adalah untuk memperoleh isolat khamir fermentatif dari batang tebu dan untuk kemudian diidentifikasi secara morfologi, biokimia dan molekuler. Isolasi khamir epifit dan endofit dilakukan dengan metode cawan sebar dari air rendaman tebu dan jus tebu pada media potato dextrose agar (PDA) dan yeast-glucose-peptone (YGP). Identifikasi morfologi berdasarkan pengamatan makroskopis dan mikroskopis. Identifikasi biokimia menggunakan uji fermentasi karbohidrat dan uji media glukosa 50%. Isolat terpilih diidentifikasi molekuler menggunakan Internal Transcribed Spacer (ITS). Hasil isolasi memperoleh 7 isolat khamir. Satu isolat terpilih (Ed 1B) didapatkan dan memiliki ciri-ciri koloni bulat, putih krem, mengkilap, timbul, bergelombang, bentuk sel ovoid dengan diameter sel 4,74 µm, memiliki budding cell, mampu memfermentasi glukosa dan sukrosa, tidak memfermentasi laktosa, serta tumbuh pada media glukosa 50%. Hasil BLAST menunjukkan bahwa isolat Ed 1B memiliki homologi 99% dengan Kodamaea ohmeri.Kata Kunci: identifikasi molekuler, isolasi, ITS, khamir, Saccharum officinarum L

    Fast computation of distance estimators

    Get PDF
    BACKGROUND: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n(3)). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n(2). Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. RESULTS: We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods. CONCLUSION: Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds

    Exploring Hierarchical Visualization Designs Using Phylogenetic Trees

    Get PDF
    Ongoing research on information visualization has produced an ever-increasing number of visualization designs. Despite this activity, limited progress has been made in categorizing this large number of information visualizations. This makes understanding their common design features challenging, and obscures the yet unexplored areas of novel designs. With this work, we provide categorization from an evolutionary perspective, leveraging a computational model to represent evolutionary processes, the phylogenetic tree. The result — a phylogenetic tree of a design corpus of hierarchical visualizations — enables better understanding of the various design features of hierarchical information visualizations, and further illuminates the space in which the visualizations lie, through support for interactive clustering and novel design suggestions. We demonstrate these benefits with our software system, where a corpus of two-dimensional hierarchical visualization designs is constructed into a phylogenetic tree. This software system supports visual interactive clustering and suggesting for novel designs; the latter capacity is also demonstrated via collaboration with an artist who sketched new designs using our system
    corecore