10 research outputs found

    Using Jackknife to Assess the Quality of Gene Order Phylogenies

    Get PDF
    Background In recent years, gene order data has attracted increasing attention from both biologists and computer scientists as a new type of data for phylogenetic analysis. If gene orders are viewed as one character with a large number of states, traditional bootstrap procedures cannot be applied. Researchers began to use a jackknife resampling method to assess the quality of gene order phylogenies. Results In this paper, we design and conduct a set of experiments to validate the performance of this jackknife procedure and provide discussions on how to conduct it properly. Our results show that jackknife is very useful to determine the confidence level of a phylogeny obtained from gene orders and a jackknife rate of 40% should be used. However, although a branch with support value of 85% can be trusted, low support branches require careful investigation before being discarded. Conclusions Our experiments show that jackknife is indeed necessary and useful for gene order data, yet some caution should be taken when the results are interpreted

    TIBA: a tool for phylogeny inference from rearrangement data with bootstrap analysis

    Get PDF
    TIBA is a tool to reconstruct phylogenetic trees from rearrangement data that consist of ordered lists of synteny blocks (or genes), where each synteny block is shared with all of its homologues in the input genomes. The evolution of these synteny blocks, through rearrangement operations, is modelled by the uniform Double-Cut-and-Join model. Using a true distance estimate under this model and simple distance-based methods, TIBA reconstructs a phylogeny of the input genomes. Unlike any previous tool for inferring phylogenies from rearrangement data, TIBA uses novel methods of robustness estimation to provide support values for the edges in the inferred tree

    Mitochondrial genome rearrangements in the Scleractinia/Corallimorpharia complex: implications for coral phylogeny

    Get PDF
    Corallimorpharia is a small Order of skeleton-less animals that is closely related to the reef-building corals (Scleractinia) and of fundamental interest in the context of understanding the potential impacts of climate change in the future on coral reefs. The relationship between the nominal Orders Corallimorpharia and Scleractinia is controversial-the former is either the closest outgroup to the Scleractinia or alternatively is derived from corals via skeleton loss. This latter scenario, the "naked coral" hypothesis, is strongly supported by analyses based on mitochondrial (mt) protein sequences, whereas the former is equally strongly supported by analyses of mt nucleotide sequences. The "naked coral" hypothesis seeks to link skeleton loss in the putative ancestor of corallimorpharians with a period of elevated oceanic CO2 during the Cretaceous, leading to the idea that these skeleton-less animals may be harbingers for the fate of coral reefs under global climate change. In an attempt to better understand their evolutionary relationships, we examined mt genome organization in a representative range (12 species, representing 3 of the 4 extant families) of corallimorpharians and compared these patterns with other Hexacorallia. The most surprising finding was that mt genome organization in Corallimorphus profundus, a deep-water species that is the most scleractinian-like of all corallimorpharians on the basis of morphology, was much more similar to the common scleractinian pattern than to those of other corallimorpharians. This finding is consistent with the idea that C. profundus represents a key position in the coral corallimorpharian transition

    Mitochondrial genome rearrangements in the Scleractinia / Corallimorpharia complex: implications for coral phylogeny

    Get PDF
    Corallimorpharia is a small Order of skeleton-less animals that is closely related to the reef-building corals (Scleractinia) and of fundamental interest in the context of understanding the potential impacts of climate change in the future on coral reefs. The relationship between the nominal Orders Corallimorpharia and Scleractinia is controversial – the former is either the closest outgroup to the Scleractinia or, alternatively is derived from corals via skeleton loss. This latter scenario, the “naked coral” hypothesis, is strongly supported by analyses based on mitochondrial protein sequences, whereas the former is equally strongly supported by analyses of mitochondrial (mt) nucleotide sequences. The “naked coral” hypothesis seeks to link skeleton loss in the putative ancestor of corallimorpharians with a period of elevated oceanic CO2 during the Cretaceous, leading to the idea that these skeleton-less animals may be harbingers for the fate of coral reefs under global climate change. In an attempt to better understand their evolutionary relationships, we examined mitochondrial genome organization in a representative range (12 species, representing 3 of the 4 extant families) of corallimorpharians and compared these patterns to other Hexacorallia. The most surprising finding was that mt genome organization in Corallimorphus profundus, a deep-water species that is the most scleractinian-like of all corallimorpharians on the basis of morphology, was much more similar to the common scleractinian pattern than to those of other corallimorpharians. This finding is consistent with the idea that C. profundus represents a key position in the coral <-> corallimorpharian transition.São Paulo Research Foundation (FAPESP)São Paulo University Marine Biology Centre (CEBIMar)National Science Council (NSC) and Academia Sinica (Thematic Grants 2005–2010) to C.A.C.Australian Research Council to D.J.M. Cora

    Robustness Evaluation for Phylogenetic Reconstruction Methods and Evolutionary Models Reconstruction of Tumor Progression

    Get PDF
    During evolutionary history, genomes evolve by DNA mutation, genome rearrangement, duplication and gene loss events. There has been endless effort to the phylogenetic and ancestral genome inference study. Due to the great development of various technology, the information about genomes is exponentially increasing, which make it possible figure the problem out. The problem has been shown so interesting that a great number of algorithms have been developed rigorously over the past decades in attempts to tackle these problems following different kind of principles. However, difficulties and limits in performance and capacity, and also low consistency largely prevent us from confidently statement that the problem is solved. To know the detailed evolutionary history, we need to infer the phylogeny of the evolutionary history (Big Phylogeny Problem) and also infer the internal nodes information (Small Phylogeny Problem). The work presented in this thesis focuses on assessing methods designed for attacking Small Phylogeny Problem and algorithms and models design for genome evolution history inference from FISH data for cancer data. During the recent decades, a number of evolutionary models and related algorithms have been designed to infer ancestral genome sequences or gene orders. Due to the difficulty of knowing the true scenario of the ancestral genomes, there must be some tools used to test the robustness of the adjacencies found by various methods. When it comes to methods for Big Phylogeny Problem, to test the confidence rate of the inferred branches, previous work has tested bootstrapping, jackknifing, and isolating and found them good resampling tools to corresponding phylogenetic inference methods. However, till now there is still no system work done to try and tackle this problem for small phylogeny. We tested the earlier resampling schemes and a new method inversion on different ancestral genome reconstruction methods and showed different resampling methods are appropriate for their corresponding methods. Cancer is famous for its heterogeneity, which is developed by an evolutionary process driven by mutations in tumor cells. Rapid, simultaneous linear and branching evolution has been observed and analyzed by earlier research. Such process can be modeled by a phylogenetic tree using different methods. Previous phylogenetic research used various kinds of dataset, such as FISH data, genome sequence, and gene order. FISH data is quite clean for the reason that it comes form single cells and shown to be enough to infer evolutionary process for cancer development. RSMT was shown to be a good model for phylogenetic analysis by using FISH cell count pattern data, but it need efficient heuristics because it is a NP-hard problem. To attack this problem, we proposed an iterative approach to approximate solutions to the steiner tree in the small phylogeny tree. It is shown to give better results comparing to earlier method on both real and simulation data. In this thesis, we continued the investigation on designing new method to better approximate evolutionary process of tumor and applying our method to other kinds of data such as information using high-throughput technology. Our thesis work can be divided into two parts. First, we designed new algorithms which can give the same parsimony tree as exact method in most situation and modified it to be a general phylogeny building tool. Second, we applied our methods to different kinds data such as copy number variation information inferred form next generation sequencing technology and predict key changes during evolution

    Phylogenetic, Genomic and Morphological Investigations of Three Lance Nematode Species (\u3ci\u3eHoplolaimus\u3c/i\u3e spp.)

    Get PDF
    Lance nematodes (Hoplolaimus spp.) are migratory ecto-endo plant-parasitic. They have been found from a wide range of the world that feed on the roots of a diversity of monocotyledonous and dicotyledonous plants, and have caused a great agricultural damage. Since more taxonomic knowledge and molecular references are demanded for the lance nematode phylogeny and population study, four chapters of lance nematode researches on three species were presented here: (1) A new species, Hoplolaimus smokyensis n. sp., was discovered from a mixed forest sample of maple (Acer sp.), hemlock (Tsuga sp.) and silverbell (Halesia carolina) from the Great Smoky Mountains National Park. It is characterized by possession of a lateral field with four incisures, an excretory pore posterior to the hemizonid, esophageal glands with three nuclei, phasmids anterior and posterior to the vulva, and the epiptygma absent. Phylogenetic analyses based on ribosomal and mitochondrial gene sequences also suggest H. smokyensis n. sp. to be an independent lineage distinct from all other reported Hoplolaimus species. (2) Additional morphological characteristics of Hoplolaimus columbus were described. Photos of its esophageal gland cell nuclei, a H. columbus male and abnormal female tails were presented. (3) The first complete de novo assembly of mitochondrial genome of Hoplolaimus columbus using Whole Genome Amplification and Illumina MiSeq technique was reported as a circularized DNA of 25228bp. The annotation results using two genetic codes were diagnosed and compared. Including H. columbus, phylogenetic relationships, gene content and gene order arrangement of 92 taxa nematodes were analyzed. (4) The phylogenetic informativeness of mitochondrial genes in Nematoda phylum is analyzed with two quantitative methods using mitochondrial genomes of 93 nematode species, including H. columbus and H. galeatus. Results from both methods agree with each other, indicate that the nad5 and nad4 contain higher informativeness than other candidates. Traditional markers like the cox1 and cytb genes contain medium informativeness. The nad4l and nad3 contain the lowest informativeness comparing with other protein-coding genes. Results also indicate that the phylogenetic informativeness is independent of the molecular sequence length of a phylogenetic marker. Concatenated-genes marker could present better phylogenetic informativeness if selected genes are higher informative

    Statistic evaluation of phylogeny of biological sequences

    Get PDF
    Tématem této diplomové práce je statistické vyhodnocení fylogeneze biologických sekvencí pomocí fylogenetických stromů. V teoretické části vypracujeme literární rešerši metodologie odhadu průběhu fylogeneze na základě podobnosti biologických sekvencí (DNA a bílkovin), kde se zaměříme na nepřesnosti v odhadu, čím jsou způsobeny a na možnosti jejich odstranění. Poté srovnáme metody pro statistické vyhodnocení správnosti průběhu fylogeneze. V praktické části navrhneme algoritmy, které budou sloužit pro testování věrohodnosti konstrukce fylogenetických stromů na základě bootstrappingu, jackknifingu, OTU jackknifingu a PTP testu, které budou schopny z biologických sekvencí ve FASTA kódu vykreslit fylogenetický strom metodou neighbor joining a lze také měnit distanční model a substituční matici. Abychom mohli tyto algoritmy použít pro statistickou podporu fylogenetických stromů, musíme ověřit jejich správnou funkci. Toto ověření vyhodnotíme na teoretických sekvencích aminokyselin. Po ověření správné funkce algoritmů, si demonstrujeme jednotlivé statistické testy na reálných 10 sekvencích ubikvitinu savců. Tyto výsledky analyzujeme a vhodně okomentujeme.The topic of my diploma thesis is the statistical evaluation of biological sequences with the help of phylogenic trees. In the theoretical part we will create a literary recherche of estimation methodology concerning the course of phylogeny on the basis of the similarity of biological sequences (DNA and proteins) and we will focus on the inaccuracies of the estimation, their causes and the possibilities of their elimination. Afterwards, we will compare the methods for the statistical evaluation of the correctness of the course of phylogeny. In the practical part of the thesis we will suggest algorithms that will be used for testing the correctness of the phylogenic trees on the basis of bootstrapping, jackknifing, OTU jackknifing and PTP test which are able to the capture phylogenic tree with the method neighbor joining from the biological sequences in FASTA code. It is also possible to change the distance model and the substitution matrix. To be able to use these algorithms for the statistical support of phylogenic trees we have to verify their right function. This verification will be evaluated on the theoretical sequences of the amino acids. For the verification of the correct function of the algorithms, we will carry out single statistical tests on real 10 sequences of mammalian ubiquitin. These results will be analysed and appropriately discussed.

    Models and Algorithms for Whole-Genome Evolution and their Use in Phylogenetic Inference

    Get PDF
    The rapid accumulation of sequenced genomes offers the chance to resolve longstanding questions about the evolutionary histories, or phylogenies, of groups of organisms. The relatively rare occurrence of large-scale evolutionary events in a whole genome, events such as genome rearrangements, duplications and losses, enables us to extract a strong and robust phylogenetic signal from whole-genome data. The work presented in this dissertation focuses on models and algorithms for whole-genome evolution and their use in phylogenetic inference. We designed algorithms to estimate pairwise genomic distances from large-scale genomic changes. We refined the evolutionary models on whole-genome evolution. We also made use of these results to provide fast and accurate methods for phylogenetic inference, that scales up, in both speed and accuracy, to modern high-resolution whole-genome data. We designed algorithms to estimate the true evolutionary distance between two genomes under genome rearrangements, and also under rearrangements, plus gains and losses. We refined the evolutionary model to be the first mathematical model to preserve the structural dichotomy in genomic organization between most prokaryotes and most eukaryotes. Those models and associated distance estimators provide a basis for studying facets of possible mechanisms of evolution through simulation and application to real genomes. Phylogenetic analyses from whole-genome data have been limited to small collections of genomes and low-resolution data; they have also lacked an effective assessment of robustness. We developed an approach that combines our distance estimator, any standard distance-based reconstruction algorithm, and a novel bootstrapping method based on resampling genomic adjacencies. The resulting tool overcomes a serious and long-standing impediment to the use of whole-genome data in phylogenetic inference and provides results comparable in accuracy and robustness to distance-based methods for sequence data. Maximum-likelihood approaches have been successfully applied to phylogenetic inferences for aligned sequences, but such applications remain primitive for whole-genome data. We developed a maximum-likelihood approach to phylogenetic analysis from whole-genome data. In combination with our bootstrap scheme, this new approach yields the first reliable phylogenetic tool for the analysis of whole-genome data at the level of syntenic blocks

    Phylogeny and chloroplast evolution in Brassicaceae

    Get PDF
    Brassicaceae is a large family of flowering plants, characterized by cruciform corolla, tetradynamous stamen and capsular fruit. In light of the important economic and scientific values of Brassicaceae, many phylogenetic and systematic studies were carried out. One recent and important phylogenetic analysis revealed three major lineages (I, II and III), however, classification at different taxonomic levels (tribe, genus, and species) remained problematic and evolutionary relationships among and within these lineages were still largely unclear. This is partly due to the fact that the past studies lacked information, as they mainly utilized the morphological data, nuclear DNA, partial chloroplast (cp) genes and so on. Nowadays, next generation sequencing (NGS) technology provides the possibility to make use of big data in phylogeny and evolutionary studies. Thus, we sequenced the chloroplast genomes of 80 representative species, using additional 15 reference chloroplast genomes from the NCBI database, and carried out both the phylogenetic reconstruction and the study of protein coding genes evolution in this novel dataset with different methods. Several novel results were obtained. 1 Successful application of NGS technology in chloroplast genome sequencing. During the final assembly, I could reconstruct full chloroplast genomes and the structure maps for 14 out of 80 sampled species, while the remaining were assembled nearly completely with only few gaps remaining. 2 Characterization of chloroplast genome structure. Gene number and order, single sequence repeat (SSR) as well as variety and distribution of large repeat sequence were characterized. 3 The difference of codon usage frequency was calculated between Cardamine resedifolia and Cardamine impatiens. Twelve genes with signatures of positive selection were identified at a family-wide level. 4 Three major lineages (I – III) were confirmed with high support values. Besides, the positions of various tribes were reclassified. Relationships among and within these lineages were highly resolved and supported in the final tree. Most of the tribes in the analyses were inferred to be monophyletic, only Thlaspideae was paraphyletic. Anastaticeae was for the first time classified into position of expanded lineage II, and position of tribe Lepidieae was delimited with relatively low support values in the final phylogenetic tree. This study was a new and successful application of NGS in large-scale Brassicaceae phylogeny and evolution, which offered the chance to look in details of the structural and functional features of the chloroplast genome. These results provided a paradigm on how to proceed towards the full elucidation of the evolutionary relationships among various biological species in the tree of life

    Mathematical Problems in Molecular Evolution and Next Generation Sequencing

    Get PDF
    The focus of this work is the development of new mathematical methods for problems in phylogenetic tree inferences. In the first part we solve several problems related to so-called partitioned alignments. In the second part we demonstrate how to calculate all identical subtrees of a given labeled tree. We make use of this to implement an efficient method for avoiding redundant likelihood operations during phylogenetic tree inferences
    corecore