365 research outputs found

    The Interrelationships of Placental Mammals and the Limits of Phylogenetic Inference

    Get PDF
    Placental mammals comprise three principal clades: Afrotheria (e.g., elephants and tenrecs), Xenarthra (e.g., armadillos and sloths), and Boreoeutheria (all other placental mammals), the relationships among which are the subject of controversy and a touchstone for debate on the limits of phylogenetic inference. Previous analyses have found support for all three hypotheses, leading some to conclude that this phylogenetic problem might be impossible to resolve due to the compounded effects of incomplete lineage sorting (ILS) and a rapid radiation. Here we show, using a genome scale nucleotide data set, microRNAs, and the reanalysis of the three largest previously published amino acid data sets, that the root of Placentalia lies between Atlantogenata and Boreoeutheria. Although we found evidence for ILS in early placental evolution, we are able to reject previous conclusions that the placental root is a hard polytomy that cannot be resolved. Reanalyses of previous data sets recover Atlantogenata + Boreoeutheria and show that contradictory results are a consequence of poorly fitting evolutionary models; instead, when the evolutionary process is better-modeled, all data sets converge on Atlantogenata. Our Bayesian molecular clock analysis estimates that marsupials diverged from placentals 157-170 Ma, crown Placentalia diverged 86-100 Ma, and crown Atlantogenata diverged 84-97 Ma. Our results are compatible with placental diversification being driven by dispersal rather than vicariance mechanisms, postdating early phases in the protracted opening of the Atlantic Ocean

    Estimating phylogenetic trees from genome-scale data

    Full text link
    As researchers collect increasingly large molecular data sets to reconstruct the Tree of Life, the heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. A class of phylogenetic methods known as "species tree methods" have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting or deep coalescence that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Although such methods are gaining in popularity, they are being adopted with caution in some quarters, in part because of an increasing number of examples of strong phylogenetic conflict between concatenation or supermatrix methods and species tree methods. Here we review theory and empirical examples that help clarify these conflicts. Thinking of concatenation as a special case of the more general model provided by the multispecies coalescent can help explain a number of differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences, base compositional heterogeneity and long branch attraction. We show that approaches such as binning, designed to augment the signal in species tree analyses, can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods that incorporate biological realism are a key to phylogenetic analysis of whole genome data.Comment: 39 pages, 3 figure

    The molecular phylogeny of placental mammals and its application to uncovering signatures of molecular adaptation.

    Get PDF
    Considerable conflict remains in the literature as to the position of the root of placental mammals, and the placement of several intra-ordinal groups. Debate continues over the use of DNA or amino acids datasets and over the use of Supertree or Supermatrix approaches. Known phenomena exist within mammal data that complicate the reconstruction of phylogeny. These include (but are not limited to), variation in longevity, body size, metabolic rates, and germ-line generation time that result in variation in mutation rates and composition biases. Previous attempts to resolve the placental mammal phylogeny have used homogeneous evolutionary models that cannot capture and adequately describe these features across the species sampled. In this thesis I explore the properties of different datasets and data types and their suitability to the resolution of the mammal phylogeny at different depths: (i) the position of the root of the placental mammals, and (ii), the intraordinal placements within the Laurasiatheria. The datasets tested were (i) mitochondrial and nuclear data types, (ii) previously published datasets for mammals, and (iii), datasets I assembled specifically for analyses at different phylogenetic depths. I propose and apply the use of heterogeneous models to resolve the position of the root of the placental mammal phylogeny to these datasets. Reconstruction of a robust mammal phylogeny provides us with an essential framework for understanding the molecular underpinnings of adaptation to environment. The placental mammals display a huge variations in life traits such longevity, body size and DNA repair efficiency, since they emerged ~100 million years ago. With this robust phylogeny, I set out to determine the level of adaptive and non-adaptive processes acting on a set of mammal genes that are linked with longevity and cancer. The results of these analyses yield important insights into data and model suitability, and provide strong evidence for a single hypothesis for the rooting of placental mammals. These results also show that Laurasiatheria intra-ordinal placements are not fully resolved and additional sampling from this diverse clade is required. Using this resolved phylogeny, specific molecular adaptations and non-adaptive mechanisms were identified in the mammalia for a set of telomere-associated genes

    Rare coral under the genomic microscope: timing and relationships among Hawaiian Montipora

    Get PDF
    Background Evolutionary patterns of scleractinian (stony) corals are difficult to infer given the existence of few diagnostic characters and pervasive phenotypic plasticity. A previous study of Hawaiian Montipora (Scleractinia: Acroporidae) based on five partial mitochondrial and two nuclear genes revealed the existence of a species complex, grouping one of the rarest known species (M. dilatata, which is listed as Endangered by the International Union for Conservation of Nature - IUCN) with widespread corals of very different colony growth forms (M. flabellata and M. cf. turgescens). These previous results could result from a lack of resolution due to a limited number of markers, compositional heterogeneity or reflect biological processes such as incomplete lineage sorting (ILS) or introgression. Results All 13 mitochondrial protein-coding genes from 55 scleractinians (14 lineages from this study) were used to evaluate if a recent origin of the M. dilatata species complex or rate heterogeneity could be compromising phylogenetic inference. Rate heterogeneity detected in the mitochondrial data set seems to have no significant impacts on the phylogenies but clearly affects age estimates. Dating analyses show different estimations for the speciation of M. dilatata species complex depending on whether taking compositional heterogeneity into account (0.8 [0.05โ€“2.6] Myr) or assuming rate homogeneity (0.4 [0.14โ€“0.75] Myr). Genomic data also provided evidence of introgression among all analysed samples of the complex. RADseq data indicated that M. capitata colour morphs may have a genetic basis. Conclusions Despite the volume of data (over 60,000 SNPs), phylogenetic relationships within the M. dilatata species complex remain unresolved most likely due to a recent origin and ongoing introgression. Species delimitation with genomic data is not concordant with the current taxonomy, which does not reflect the true diversity of this group. Nominal species within the complex are either undergoing a speciation process or represent ecomorphs exhibiting phenotypic polymorphisms.info:eu-repo/semantics/publishedVersio

    Developing and applying supertree methods in Phylogenomics and Macroevolution

    Get PDF
    Supertrees can be used to combine partially overalapping trees and generate more inclusive phylogenies. It has been proposed that Maximum Likelihood (ML) supertrees method (SM) could be developed using an exponential probability distribution to model errors in the input trees (given a proposed supertree). When the tree-ยญโ€to-ยญโ€tree distances used in the ML computation are symmetric differences, the ML SM has been shown to be equivalent to a Majority-ยญโ€Rule consensus SM, and hence, exactly as the latter, it has the desirable property of being a median tree (with reference to the set of input trees). The ability to estimate the likelihood of supertrees, allows implementing Bayesian (MCMC) approaches, which have the advantage to allow the support for the clades in a supertree to be properly estimated. I present here the L.U.St software package; it contains the first implementation of a ML SM and allows for the first time statistical tests on supertrees. I also characterized the first implementation of the Bayesian (MCMC) SM. Both the ML and the Bayesian (MCMC) SMs have been tested for and found to be immune to biases. The Bayesian (MCMC) SM is applied to the reanalyses of a variety of datasets (i.e. the datasets for the Metazoa and the Carnivora), and I have also recovered the first Bayesian supertree-ยญโ€based phylogeny of the Eubacteria and the Archaebacteria. These new SMs are discussed, with reference to other, well-ยญโ€ known SMs like Matrix Representation with Parsimony. Both the ML and Bayesian SM offer multiple attractive advantages over current alternatives

    Developing and applying supertree methods in Phylogenomics and Macroevolution

    Get PDF
    Supertrees can be used to combine partially overalapping trees and generate more inclusive phylogenies. It has been proposed that Maximum Likelihood (ML) supertrees method (SM) could be developed using an exponential probability distribution to model errors in the input trees (given a proposed supertree). When the tree-ยญโ€to-ยญโ€tree distances used in the ML computation are symmetric differences, the ML SM has been shown to be equivalent to a Majority-ยญโ€Rule consensus SM, and hence, exactly as the latter, it has the desirable property of being a median tree (with reference to the set of input trees). The ability to estimate the likelihood of supertrees, allows implementing Bayesian (MCMC) approaches, which have the advantage to allow the support for the clades in a supertree to be properly estimated. I present here the L.U.St software package; it contains the first implementation of a ML SM and allows for the first time statistical tests on supertrees. I also characterized the first implementation of the Bayesian (MCMC) SM. Both the ML and the Bayesian (MCMC) SMs have been tested for and found to be immune to biases. The Bayesian (MCMC) SM is applied to the reanalyses of a variety of datasets (i.e. the datasets for the Metazoa and the Carnivora), and I have also recovered the first Bayesian supertree-ยญโ€based phylogeny of the Eubacteria and the Archaebacteria. These new SMs are discussed, with reference to other, well-ยญโ€ known SMs like Matrix Representation with Parsimony. Both the ML and Bayesian SM offer multiple attractive advantages over current alternatives

    ๋ฐ”์ด์˜ค์ธํฌ๋งคํ‹ฑ์Šค ํ”„๋กœ๊ทธ๋žจ์„ ์ด์šฉํ•œ ์œ ์ „์ž ๋งˆ์ปค ์„ ๋ณ„ ๋ฐ ๊ณ„ํ†ต์ˆ˜ ์˜ค๋ฅ˜ ํ‰๊ฐ€ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2021.8. ์†ํ˜„์„.์ง€์†์ ์œผ๋กœ ์‚ฐ์ถœ๋˜๋Š” ์—„์ฒญ๋‚œ ์–‘์˜ ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ๋Š” ์œ ๊ธฐ์ฒด ์‚ฌ์ด์˜ ์ง„ํ™”์  ์—ญ์‚ฌ์™€ ๊ณ„ํ†ตํ•™์  ๊ด€๊ณ„(phylogenetic relationship)๋ฅผ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ์ด์ œ ๊ณ„ํ†ต์ˆ˜ ๊ตฌ์ถ•์€ ๊ฑฐ์˜ ๋ชจ๋“  ์ƒ๋ฌผํ•™ ์—ฐ๊ตฌ์—์„œ ์ˆ˜ํ–‰๋˜๋Š” ๊ณผ์ •์˜ ํ•˜๋‚˜๊ฐ€ ๋˜์—ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๊ณ„ํ†ต์ •๋ณดํ•™(phyloinformatics)์€ ๊ณ„ํ†ต์ˆ˜ ์ƒ์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์ง„ํ™”์  ๋ชจ๋ธ ๊ฐœ๋ฐœ๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์  ๋˜๋Š” ๋ฐฉ๋ฒ•๋ก ์  ์—ฐ๊ตฌ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๋ฐœ์ „๋˜์–ด ์™”๋‹ค. ํ˜„์žฌ์˜ ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์€ ์„œ์—ด ๋ฐ์ดํ„ฐ, ์ฆ‰ ์œ ์ „์  ๋งˆ์ปค๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ƒ์„ฑํ•จ์œผ๋กœ์จ ์‹ค์ œ์— ๊ฐ€๊นŒ์šด ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์œ ์ „์  ๋งˆ์ปค๋ฅผ ๋น„๋กฏํ•œ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ  ๋”ฐ๋ผ์˜ค๋Š” ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์˜ ์ •ํ™•์„ฑ์— ๋Œ€ํ•œ ์˜๋ฌธ์ด ์ ์ฐจ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ฃจ์–ด ์ง€๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด์„œ ๊ณ„ํ†ต์ˆ˜์˜ ์ •ํ™•์„ฑ ๋ฐ ์‹ ๋ขฐ์„ฑ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๋‹ค์ˆ˜ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์ด๋‹ค. ๋ถ„์ž ์‹œ์Šคํ…œํ•™ ๊ด€์ ์—์„œ ๊ณ„ํ†ต์ˆ˜์— ๋Œ€ํ•œ ์ •ํ™•์„ฑ ํ‰๊ฐ€๋Š” ๋‘ ๊ฐ€์ง€ ๊ฐˆ๋ž˜๋กœ ๋‚˜๋ˆ„์–ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ํ•˜๋‚˜๋Š” ์ง„ํ™” ์กฐ๊ฑด, ๋ถ„์ž๋ฐ์ดํ„ฐ์˜ ์–‘๊ณผ ๊ฐ™์€ ํŠน์ • ํ™˜๊ฒฝ ์•„๋ž˜์—์„œ ๊ณ„ํ†ต ๋ถ„์„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์ด๊ณ , ๋˜ ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ํŠน์ • ๊ณ„ํ†ต์ˆ˜๋ฅผ ์–ผ๋งˆ๋‚˜ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š”์ง€์— ์ง‘์ค‘ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ์…‹์˜ ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ ์‹ ๋ขฐํ•  ๋งŒํ•œ ๊ณ„ํ†ต์ˆ˜๋ฅผ ํš๋“ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•œ ํ›„, ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ์˜ ์ ์ ˆ์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜๋‹ค. ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ์ทจ๊ธ‰ํ•˜๋Š” ์ตœ๊ทผ ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์—์„œ ํ™•๋ฅ ๋ก ์  ์˜ค๋ฅ˜์˜ ๊ฐ€๋Šฅ์„ฑ์€ ๋‚ฎ์•„์กŒ์ง€๋งŒ, ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ฐ€๋Šฅ์„ฑ์€ ์˜คํžˆ๋ ค ๋†’์•„์กŒ์œผ๋ฏ€๋กœ, ๊ณ„ํ†ต์ˆ˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ ๋ฐ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„ํ†ต ๋ถ„์„ ๊ฒฐ๊ณผ ํ›„์— ๋ฐ์ดํ„ฐ์…‹์ด ๊ฐ€์ง€๋Š” ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ทผ์›์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•œ ๊ณผ์ •์ด ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด์— ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ ๊ณ„ํ†ต์ˆ˜์˜ ์‹ ๋ขฐ๋„ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด APSE (Assessment Program for Systematic Error, tentative)๋ผ๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. APSE๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋ถ„๋ฅ˜๊ตฐ ํŠน์ด์  ์ƒ๋Œ€์  ๊ตฌ์„ฑ ๋นˆ๋„ ๋ณ€์ด(RCFV)์™€ ๋Œ€์นญ์  ์™œ๊ณก๊ฐ’(skew)์„ ์‚ฐ์ถœํ•˜์—ฌ ์—ผ๊ธฐ์„œ์—ด์˜ ๊ตฌ์„ฑ์  ํŽธํ–ฅ์„ฑ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์–ป๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์—ฐ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์œ ์ „์  ์ด์งˆ์„ฑ(heterogeneity) ๋ฐ ์œ ์ „์  ๋ณ€์ด ํŽธํ–ฅ์„ฑ(mutational bias)์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค์–‘ํ•œ ์—ผ๊ธฐ ๊ทธ๋ฃน์˜ ๋นˆ๋„, ๋ณ€์ด์— ์˜ํ•œ ๋‹ค์ˆ˜ ์น˜ํ™˜์„ ์˜๋ฏธํ•˜๋Š” ํฌํ™”(saturation)์™€ ๊ณต์œ  ๊ฒฐ์ธก ๋ฐ์ดํ„ฐ(shared missing data) ๋ณ€์ˆ˜๋ฅผ ํ†ตํ•ด ์‹œ์Šคํ…œ ์˜ค๋ฅ˜๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ๋Š” ํŽธํ–ฅ์„ฑ ์ •๋ณด๋“ค์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋˜ํ•œ, ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์œ ์ „์ž ๋งˆ์ปค ์‚ฌ์ด์˜ ๋ชจ์ˆœ๋˜๋Š” ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ถœ๋ ฅํ•˜๊ณ  ์žˆ๋Š”, ํŠน์ด์  ์˜ˆ์‹œ(Terebelliformia, Daphniid, Glires)๋ฅผ APSE์— ์ ์šฉํ•˜์—ฌ ๋งˆ์ปค ๋ฐ์ดํ„ฐ์…‹์˜ ์‹œ์Šคํ…œ ์˜ค๋ฅ˜ ํ‰๊ฐ€์™€ ๊ทธ์— ๋”ฐ๋ผ ์„ ๋ณ„๋œ ๋งˆ์ปค ๊ณ„ํ†ต์ˆ˜์˜ ์ •ํ™•์„ฑ ์ถ”๋ก ์— ๋Œ€ํ•œ ๋ถ„์„์ด ์ œ๋Œ€๋กœ ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋”ฐ๋ผ์„œ ํ–ฅํ›„ APSE๋Š” ์‹œ์Šคํ…œํ•™์  ๊ด€์ ์—์„œ ๋ฐ์ดํ„ฐ ํ€„๋ฆฌํ‹ฐ์— ์ง‘์ค‘ํ•˜์—ฌ ์ƒ์„ฑ๋œ ๊ณ„ํ†ต์ˆ˜๊ฐ€ ๋ณด๋‹ค ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ด๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์šฉ์ž์˜ ๋ฐ์ดํ„ฐ์™€ ๊ณ„ํ†ต์ˆ˜ ์‚ฌ์ด์˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ์—ญํ• ์„ ํ•  ๊ฒƒ์ด๊ณ , ์œ ์ „์  ๋งˆ์ปค์— ๋”ฐ๋ผ ์˜คํ•ด์˜ ์†Œ์ง€๊ฐ€ ์žˆ๋Š” ๊ณ„ํ†ต์ˆ˜๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ์„ ๋•Œ, ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ทผ์›์— ๋Œ€ํ•œ ์ฒ ์ €ํ•œ ๋ถ„์„๊ณผ ํ•ด๋‹น ์˜ค๋ฅ˜์˜ ์˜ํ–ฅ์„ ๋ฐ›์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ„ํ†ต์ˆ˜์— ์ฃผ๋Š” ํšจ๊ณผ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ์ผ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ ๊ธฐ๋Œ€ํ•œ๋‹ค.The steadily increasing volume of biological data with decisive phylogenetic relationship provides unparalleled opportunities in bioinformatics. Phylogenetics based on a large amount of datasets handling an evolutionary history and assigning the placement of taxa in a phylogeny establishes the tree of life. Constructing a phylogeny involving a phylogenetic analysis is implemented in most branches of biology and emphasizing the evolutionary history elucidates the phylogenetical background as a prerequisite interpreting a specific biological system, which is a biologically indispensable process. Due to the advent of computing and sequencing techniques as the phylogenetic approach, phyloinformatics has rapidly advanced at the technical and methodological levels along with phylogenetic reconstruction algorithm and evolutionary models. Unlike the classic approach using morphological data, modern phylogenetic analysis reconstructs a phylogeny using genetic information following the inference of phylogenetic tree from molecular data. Therefore, phylogeneticists have naturally dealt with questions concerning the accuracy of phylogenetic estimation and carried out studies on the reliability of phylogenies. In terms of molecular systematics, the concerns regarding the assessment of phylogenetic accuracy considering specific evolutionary conditions and the amount of molecular data implemented can now be divided into two types: how phylogenetic method works and how reliable it is under certain circumstances. Moreover, in terms of data quality, assessment for suitability of nuclear marker is required before the phylogenetic inference is performed for confident phylogeny. Recently, the probability of stochastic errors in phylogenetic estimation dealing with a large-scale datasets has decreased, while the probability of systematic errors has increased. Thus, before the implementation of phylogenetic reconstruction, the assessment of sources of systematic errors is indispensable for the improvement and estimation of phylogenetic accuracy. Assessment Program for Systematic Error (APSE) developed by this study will plays a key role in assessment between user datasets and phylogenies for improving the results of phylogenetic reconstruction in systematics and will be able to implement an analysis of the effect on data bearing systematic errors in a phylogeny after the misleading phylogenetic results are produced. This study with APSE will serve as the inference of phylogenetic accuracy and the assessment of systematic errors using an unresolved example showing the contradicting topologies between different gene markers in the same diversity group. Furthermore, by selectively grouping the properties of the existing systematic biases provided by the APSE, it proceeds in the direction of proposing a new protocol that can provide the best gene marker among candidate markers for a specific taxon.I. INTRODUCTION 1 1.1 Background of research 1 1.2 Necessity of research 20 1.3 Research objectives 22 II. MATERIALS AND METHODS 30 2.1 Datasets definition and data collection 30 2.2 Data processing and bioinformatics software used 33 2.3 Phylogenetic reconstruction and accuracy assessment 36 2.4 Software development environment and allowable data 37 2.5 Assessment of the systematic errors 38 III. RESULTS 45 3.1 Phylogenetic analysis results for incongruence between gene markers 45 3.2 Data-quality analysis using systematic errors 49 IV. DISCUSSION 79 4.1 Significance and implications of study 79 4.2 Application to bioinformatics research 80 4.3 Improvement and achievement 81 V. CONCLUSION AND SUMMARY 83 5.1 Conclusion 83 5.2 Summary 84 BIBLIOGRAPHY 87 ABSTRACT (KOREAN) 96์„

    Suprafamilial relationships among Rodentia and the phylogenetic effect of removing fast-evolving nucleotides in mitochondrial, exon and intron fragments

    Get PDF
    The number of rodent clades identified above the family level is contentious, and to date, no consensus has been reached on the basal evolutionary relationships among all rodent families. Rodent suprafamilial phylogenetic relationships are investigated in the present study using approximately 7600 nucleotide characters derived from two mitochondrial genes (Cytochrome b and 12S rRNA), two nuclear exons (IRBP and vWF) and four nuclear introns (MGF, PRKC, SPTBN, THY). Because increasing the number of nucleotides does not necessarily increase phylogenetic signal (especially if the data is saturated), we assess the potential impact of saturation for each dataset by removing the fastest-evolving positions that have been recognized as sources of inconsistencies in phylogenetics

    Investigating Evolutionary History Using Phylogenomics

    Get PDF
    Reconstructing the Tree of Life is one of the principal aims of evolutionary biology. The development of molecular phylogenetics to elucidate evolutionary history has complemented palaeontology, biogeography, and archaeology in elucidating biological history. The development of molecular-clock analyses allowed evolutionary timescales to be estimated using nucleotide sequences and other products of the evolutionary process Until recently, the twin challenges of molecular dating were in obtaining sufficient data and developing robust methods. The former concern is now less important as highโ€“throughput sequencing technology allows entire genomes to be sampled. Genomeโ€“scale data enhances statistical power, but accompanying this wealth of data is a new suite of analytical challenges. One of these key challenges is analysing these data in synthesis with the paleontological record without statistical overparameterisation. There are also aspects of the evolutionary process, such as amongโ€“lineage rate variation, that can affect the precision and accuracy of current methods. In this thesis, I first use the richest nucleotide sequence data set of insects available to estimate an authoritative insect evolutionary timescale that dates the origins and diversification of every major insect order. I then focus on molecular-clock methods by testing their performance in inferring evolutionary rates from timeโ€“structured data, common in the study of ancient DNA. I find that amongโ€“rate lineage variation and phyloโ€“temporal clustering affect rate estimates. I also study data partitioning, a common technique used to optimise the analysis of multilocus data where independent parameters are applied across different subsets of the data. New data from the genomic revolution gifts biologists new opportunities to re-examine enduring questions about the evolutionary process. Here, I use phylogenetic tools to show that evolution leaves figurative fingerprints on genomes over millions of years
    • โ€ฆ
    corecore