415 research outputs found

    Assessing Combinability of Phylogenomic Data Using Bayes Factors

    Get PDF
    With the rapid reduction in sequencing costs of high-throughput genomic data, it has become commonplace to use hundreds of genes to infer phylogeny of any study system. While sampling a large number of genes has given us a tremendous opportunity to uncover previously unknown relationships and improve phylogenetic resolution, it also presents us with new challenges when the phylogenetic signal is confused by differences in the evolutionary histories of sampled genes. Given the incorporation of accurate marginal likelihood estimation methods into popular Bayesian software programs, it is natural to consider using the Bayes Factor (BF) to compare different partition models in which genes within any given partition subset share both tree topology and edge lengths. We explore using marginal likelihood to assess data subset combinability when data subsets have varying levels of phylogenetic discordance due to deep coalescence events among genes (simulated within a species tree), and compare the results with our recently described phylogenetic informational dissonance index (D) estimated for each data set. BF effectively detects phylogenetic incongruence and provides a way to assess the statistical significance of D values. We use BFs to assess data combinability using an empirical data set comprising 56 plastid genes from the green algal order Volvocales. We also discuss the potential need for calibrating BFs and demonstrate that BFs used in this study are correctly calibrated

    Molecular phylogenies map to biogeography better than morphological ones

    Get PDF
    Phylogenetic relationships are inferred principally from two classes of data: morphological and molecular. Most current phylogenies of extant taxa are inferred from molecules, and when morphological and molecular trees conflict the latter are often preferred. Although supported by simulations, the superiority of molecular trees has never been assessed empirically. Here we test phylogenetic accuracy using two independent data sources: biogeographical distributions and fossil first occurrences. For 48 pairs of morphological and molecular trees, we show that, on average, molecular trees provide a better fit to biogeographical data than their morphological counterparts, and that, biogeographical congruence increases over research time. We find no significant differences in stratigraphical congruence between morphological and molecular trees. These findings have implications for understanding homoplasy in morphological data sets, the utility of morphology as a test of molecular hypotheses, and the implications of analysing fossil groups for which molecular data are unavailable

    The hymenopteran tree of life: evidence from protein-coding genes and objectively aligned ribosomal data

    Get PDF
    Previous molecular analyses of higher hymenopteran relationships have largely been based on subjectively aligned ribosomal sequences (18S and 28S). Here, we reanalyze the 18S and 28S data (unaligned about 4.4 kb) using an objective and a semi-objective alignment approach, based on MAFFT and BAli-Phy, respectively. Furthermore, we present the first analyses of a substantial protein-coding data set (4.6 kb from one mitochondrial and four nuclear genes). Our results indicate that previous studies may have suffered from inflated support values due to subjective alignment of the ribosomal sequences, but apparently not from significant biases. The protein data provide independent confirmation of several earlier results, including the monophyly of non-xyelid hymenopterans, Pamphilioidea + Unicalcarida, Unicalcarida, Vespina, Apocrita, Proctotrupomorpha and core Proctotrupomorpha. The protein data confirm that Aculeata are nested within a paraphyletic Evaniomorpha, but cast doubt on the monophyly of Evanioidea. Combining the available morphological, ribosomal and protein-coding data, we examine the total-evidence signal as well as congruence and conflict among the three data sources. Despite an emerging consensus on many higher-level hymenopteran relationships, several problems remain unresolved or contentious, including rooting of the hymenopteran tree, relationships of the woodwasps, placement of Stephanoidea and Ceraphronoidea, and the sister group of Aculeata

    Investigating tricky nodes in the Tree of Life

    Get PDF

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    Using Phylogenomic Patterns and Gene Ontology to Identify Proteins of Importance in Plant Evolution

    Get PDF
    We use measures of congruence on a combined expressed sequenced tag genome phylogeny to identify proteins that have potential significance in the evolution of seed plants. Relevant proteins are identified based on the direction of partitioned branch and hidden support on the hypothesis obtained on a 16-species tree, constructed from 2,557 concatenated orthologous genes. We provide a general method for detecting genes or groups of genes that may be under selection in directions that are in agreement with the phylogenetic pattern. Gene partitioning methods and estimates of the degree and direction of support of individual gene partitions to the overall data set are used. Using this approach, we correlate positive branch support of specific genes for key branches in the seed plant phylogeny. In addition to basic metabolic functions, such as photosynthesis or hormones, genes involved in posttranscriptional regulation by small RNAs were significantly overrepresented in key nodes of the phylogeny of seed plants. Two genes in our matrix are of critical importance as they are involved in RNA-dependent regulation, essential during embryo and leaf development. These are Argonaute and the RNA-dependent RNA polymerase 6 found to be overrepresented in the angiosperm clade. We use these genes as examples of our phylogenomics approach and show that identifying partitions or genes in this way provides a platform to explain some of the more interesting organismal differences among species, and in particular, in the evolution of plants

    Random Addition Concatenation Analysis: A Novel Approach to the Exploration of Phylogenomic Signal Reveals Strong Agreement between Core and Shell Genomic Partitions in the Cyanobacteria

    Get PDF
    Recent whole-genome approaches to microbial phylogeny have emphasized partitioning genes into functional classes, often focusing on differences between a stable core of genes and a variable shell. To rigorously address the effects of partitioning and combining genes in genome-level analyses, we developed a novel technique called Random Addition Concatenation Analysis (RADICAL). RADICAL operates by sequentially concatenating randomly chosen gene partitions starting with a single-gene partition and ending with the entire genomic data set. A phylogenetic tree is built for every successive addition, and the entire process is repeated creating multiple random concatenation paths. The result is a library of trees representing a large variety of differently sized random gene partitions. This library can then be mined to identify unique topologies, assess overall agreement, and measure support for different trees. To evaluate RADICAL, we used 682 orthologous genes across 13 cyanobacterial genomes. Despite previous assertions of substantial differences between a core and a shell set of genes for this data set, RADICAL reveals the two partitions contain congruent phylogenetic signal. Substantial disagreement within the data set is limited to a few nodes and genes involved in metabolism, a functional group that is distributed evenly between the core and the shell partitions. We highlight numerous examples where RADICAL reveals aspects of phylogenetic behavior not evident by examining individual gene trees or a โ€œโ€˜total evidenceโ€ tree. Our method also demonstrates that most emergent phylogenetic signal appears early in the concatenation process. The software is freely available at http://desalle.amnh.org

    ๋ฐ”์ด์˜ค์ธํฌ๋งคํ‹ฑ์Šค ํ”„๋กœ๊ทธ๋žจ์„ ์ด์šฉํ•œ ์œ ์ „์ž ๋งˆ์ปค ์„ ๋ณ„ ๋ฐ ๊ณ„ํ†ต์ˆ˜ ์˜ค๋ฅ˜ ํ‰๊ฐ€ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2021.8. ์†ํ˜„์„.์ง€์†์ ์œผ๋กœ ์‚ฐ์ถœ๋˜๋Š” ์—„์ฒญ๋‚œ ์–‘์˜ ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ๋Š” ์œ ๊ธฐ์ฒด ์‚ฌ์ด์˜ ์ง„ํ™”์  ์—ญ์‚ฌ์™€ ๊ณ„ํ†ตํ•™์  ๊ด€๊ณ„(phylogenetic relationship)๋ฅผ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ์ด์ œ ๊ณ„ํ†ต์ˆ˜ ๊ตฌ์ถ•์€ ๊ฑฐ์˜ ๋ชจ๋“  ์ƒ๋ฌผํ•™ ์—ฐ๊ตฌ์—์„œ ์ˆ˜ํ–‰๋˜๋Š” ๊ณผ์ •์˜ ํ•˜๋‚˜๊ฐ€ ๋˜์—ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๊ณ„ํ†ต์ •๋ณดํ•™(phyloinformatics)์€ ๊ณ„ํ†ต์ˆ˜ ์ƒ์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์ง„ํ™”์  ๋ชจ๋ธ ๊ฐœ๋ฐœ๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์  ๋˜๋Š” ๋ฐฉ๋ฒ•๋ก ์  ์—ฐ๊ตฌ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๋ฐœ์ „๋˜์–ด ์™”๋‹ค. ํ˜„์žฌ์˜ ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์€ ์„œ์—ด ๋ฐ์ดํ„ฐ, ์ฆ‰ ์œ ์ „์  ๋งˆ์ปค๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ƒ์„ฑํ•จ์œผ๋กœ์จ ์‹ค์ œ์— ๊ฐ€๊นŒ์šด ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์œ ์ „์  ๋งˆ์ปค๋ฅผ ๋น„๋กฏํ•œ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ  ๋”ฐ๋ผ์˜ค๋Š” ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์˜ ์ •ํ™•์„ฑ์— ๋Œ€ํ•œ ์˜๋ฌธ์ด ์ ์ฐจ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ฃจ์–ด ์ง€๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด์„œ ๊ณ„ํ†ต์ˆ˜์˜ ์ •ํ™•์„ฑ ๋ฐ ์‹ ๋ขฐ์„ฑ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๋‹ค์ˆ˜ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์ด๋‹ค. ๋ถ„์ž ์‹œ์Šคํ…œํ•™ ๊ด€์ ์—์„œ ๊ณ„ํ†ต์ˆ˜์— ๋Œ€ํ•œ ์ •ํ™•์„ฑ ํ‰๊ฐ€๋Š” ๋‘ ๊ฐ€์ง€ ๊ฐˆ๋ž˜๋กœ ๋‚˜๋ˆ„์–ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ํ•˜๋‚˜๋Š” ์ง„ํ™” ์กฐ๊ฑด, ๋ถ„์ž๋ฐ์ดํ„ฐ์˜ ์–‘๊ณผ ๊ฐ™์€ ํŠน์ • ํ™˜๊ฒฝ ์•„๋ž˜์—์„œ ๊ณ„ํ†ต ๋ถ„์„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์ด๊ณ , ๋˜ ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ํŠน์ • ๊ณ„ํ†ต์ˆ˜๋ฅผ ์–ผ๋งˆ๋‚˜ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š”์ง€์— ์ง‘์ค‘ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ์…‹์˜ ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ ์‹ ๋ขฐํ•  ๋งŒํ•œ ๊ณ„ํ†ต์ˆ˜๋ฅผ ํš๋“ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•œ ํ›„, ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ์˜ ์ ์ ˆ์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜๋‹ค. ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ์ทจ๊ธ‰ํ•˜๋Š” ์ตœ๊ทผ ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์—์„œ ํ™•๋ฅ ๋ก ์  ์˜ค๋ฅ˜์˜ ๊ฐ€๋Šฅ์„ฑ์€ ๋‚ฎ์•„์กŒ์ง€๋งŒ, ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ฐ€๋Šฅ์„ฑ์€ ์˜คํžˆ๋ ค ๋†’์•„์กŒ์œผ๋ฏ€๋กœ, ๊ณ„ํ†ต์ˆ˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ ๋ฐ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„ํ†ต ๋ถ„์„ ๊ฒฐ๊ณผ ํ›„์— ๋ฐ์ดํ„ฐ์…‹์ด ๊ฐ€์ง€๋Š” ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ทผ์›์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•œ ๊ณผ์ •์ด ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด์— ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ ๊ณ„ํ†ต์ˆ˜์˜ ์‹ ๋ขฐ๋„ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด APSE (Assessment Program for Systematic Error, tentative)๋ผ๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. APSE๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋ถ„๋ฅ˜๊ตฐ ํŠน์ด์  ์ƒ๋Œ€์  ๊ตฌ์„ฑ ๋นˆ๋„ ๋ณ€์ด(RCFV)์™€ ๋Œ€์นญ์  ์™œ๊ณก๊ฐ’(skew)์„ ์‚ฐ์ถœํ•˜์—ฌ ์—ผ๊ธฐ์„œ์—ด์˜ ๊ตฌ์„ฑ์  ํŽธํ–ฅ์„ฑ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์–ป๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์—ฐ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์œ ์ „์  ์ด์งˆ์„ฑ(heterogeneity) ๋ฐ ์œ ์ „์  ๋ณ€์ด ํŽธํ–ฅ์„ฑ(mutational bias)์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค์–‘ํ•œ ์—ผ๊ธฐ ๊ทธ๋ฃน์˜ ๋นˆ๋„, ๋ณ€์ด์— ์˜ํ•œ ๋‹ค์ˆ˜ ์น˜ํ™˜์„ ์˜๋ฏธํ•˜๋Š” ํฌํ™”(saturation)์™€ ๊ณต์œ  ๊ฒฐ์ธก ๋ฐ์ดํ„ฐ(shared missing data) ๋ณ€์ˆ˜๋ฅผ ํ†ตํ•ด ์‹œ์Šคํ…œ ์˜ค๋ฅ˜๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ๋Š” ํŽธํ–ฅ์„ฑ ์ •๋ณด๋“ค์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋˜ํ•œ, ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์œ ์ „์ž ๋งˆ์ปค ์‚ฌ์ด์˜ ๋ชจ์ˆœ๋˜๋Š” ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ถœ๋ ฅํ•˜๊ณ  ์žˆ๋Š”, ํŠน์ด์  ์˜ˆ์‹œ(Terebelliformia, Daphniid, Glires)๋ฅผ APSE์— ์ ์šฉํ•˜์—ฌ ๋งˆ์ปค ๋ฐ์ดํ„ฐ์…‹์˜ ์‹œ์Šคํ…œ ์˜ค๋ฅ˜ ํ‰๊ฐ€์™€ ๊ทธ์— ๋”ฐ๋ผ ์„ ๋ณ„๋œ ๋งˆ์ปค ๊ณ„ํ†ต์ˆ˜์˜ ์ •ํ™•์„ฑ ์ถ”๋ก ์— ๋Œ€ํ•œ ๋ถ„์„์ด ์ œ๋Œ€๋กœ ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋”ฐ๋ผ์„œ ํ–ฅํ›„ APSE๋Š” ์‹œ์Šคํ…œํ•™์  ๊ด€์ ์—์„œ ๋ฐ์ดํ„ฐ ํ€„๋ฆฌํ‹ฐ์— ์ง‘์ค‘ํ•˜์—ฌ ์ƒ์„ฑ๋œ ๊ณ„ํ†ต์ˆ˜๊ฐ€ ๋ณด๋‹ค ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ด๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์šฉ์ž์˜ ๋ฐ์ดํ„ฐ์™€ ๊ณ„ํ†ต์ˆ˜ ์‚ฌ์ด์˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ์—ญํ• ์„ ํ•  ๊ฒƒ์ด๊ณ , ์œ ์ „์  ๋งˆ์ปค์— ๋”ฐ๋ผ ์˜คํ•ด์˜ ์†Œ์ง€๊ฐ€ ์žˆ๋Š” ๊ณ„ํ†ต์ˆ˜๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ์„ ๋•Œ, ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ทผ์›์— ๋Œ€ํ•œ ์ฒ ์ €ํ•œ ๋ถ„์„๊ณผ ํ•ด๋‹น ์˜ค๋ฅ˜์˜ ์˜ํ–ฅ์„ ๋ฐ›์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ„ํ†ต์ˆ˜์— ์ฃผ๋Š” ํšจ๊ณผ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ์ผ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ ๊ธฐ๋Œ€ํ•œ๋‹ค.The steadily increasing volume of biological data with decisive phylogenetic relationship provides unparalleled opportunities in bioinformatics. Phylogenetics based on a large amount of datasets handling an evolutionary history and assigning the placement of taxa in a phylogeny establishes the tree of life. Constructing a phylogeny involving a phylogenetic analysis is implemented in most branches of biology and emphasizing the evolutionary history elucidates the phylogenetical background as a prerequisite interpreting a specific biological system, which is a biologically indispensable process. Due to the advent of computing and sequencing techniques as the phylogenetic approach, phyloinformatics has rapidly advanced at the technical and methodological levels along with phylogenetic reconstruction algorithm and evolutionary models. Unlike the classic approach using morphological data, modern phylogenetic analysis reconstructs a phylogeny using genetic information following the inference of phylogenetic tree from molecular data. Therefore, phylogeneticists have naturally dealt with questions concerning the accuracy of phylogenetic estimation and carried out studies on the reliability of phylogenies. In terms of molecular systematics, the concerns regarding the assessment of phylogenetic accuracy considering specific evolutionary conditions and the amount of molecular data implemented can now be divided into two types: how phylogenetic method works and how reliable it is under certain circumstances. Moreover, in terms of data quality, assessment for suitability of nuclear marker is required before the phylogenetic inference is performed for confident phylogeny. Recently, the probability of stochastic errors in phylogenetic estimation dealing with a large-scale datasets has decreased, while the probability of systematic errors has increased. Thus, before the implementation of phylogenetic reconstruction, the assessment of sources of systematic errors is indispensable for the improvement and estimation of phylogenetic accuracy. Assessment Program for Systematic Error (APSE) developed by this study will plays a key role in assessment between user datasets and phylogenies for improving the results of phylogenetic reconstruction in systematics and will be able to implement an analysis of the effect on data bearing systematic errors in a phylogeny after the misleading phylogenetic results are produced. This study with APSE will serve as the inference of phylogenetic accuracy and the assessment of systematic errors using an unresolved example showing the contradicting topologies between different gene markers in the same diversity group. Furthermore, by selectively grouping the properties of the existing systematic biases provided by the APSE, it proceeds in the direction of proposing a new protocol that can provide the best gene marker among candidate markers for a specific taxon.I. INTRODUCTION 1 1.1 Background of research 1 1.2 Necessity of research 20 1.3 Research objectives 22 II. MATERIALS AND METHODS 30 2.1 Datasets definition and data collection 30 2.2 Data processing and bioinformatics software used 33 2.3 Phylogenetic reconstruction and accuracy assessment 36 2.4 Software development environment and allowable data 37 2.5 Assessment of the systematic errors 38 III. RESULTS 45 3.1 Phylogenetic analysis results for incongruence between gene markers 45 3.2 Data-quality analysis using systematic errors 49 IV. DISCUSSION 79 4.1 Significance and implications of study 79 4.2 Application to bioinformatics research 80 4.3 Improvement and achievement 81 V. CONCLUSION AND SUMMARY 83 5.1 Conclusion 83 5.2 Summary 84 BIBLIOGRAPHY 87 ABSTRACT (KOREAN) 96์„

    Eumalacostracan phylogeny and total evidence: limitations of the usual suspects

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The phylogeny of Eumalacostraca (Crustacea) remains elusive, despite over a century of interest. Recent morphological and molecular phylogenies appear highly incongruent, but this has not been assessed quantitatively. Moreover, 18S rRNA trees show striking branch length differences between species, accompanied by a conspicuous clustering of taxa with similar branch lengths. Surprisingly, previous research found no rate heterogeneity. Hitherto, no phylogenetic analysis of all major eumalacostracan taxa (orders) has either combined evidence from multiple loci, or combined molecular and morphological evidence.</p> <p>Results</p> <p>We combined evidence from four nuclear ribosomal and mitochondrial loci (18S rRNA, 28S rRNA, 16S rRNA, and cytochrome <it>c </it>oxidase subunit I) with a newly synthesized morphological dataset. We tested the homogeneity of data partitions, both in terms of character congruence and the topological congruence of inferred trees. We also performed Bayesian and parsimony analyses on separate and combined partitions, and tested the contribution of each partition. We tested for potential long-branch attraction (LBA) using taxon deletion experiments, and with relative rate tests. Additionally we searched for molecular polytomies (spurious clades). Lastly, we investigated the phylogenetic stability of taxa, and assessed their impact on inferred relationships over the whole tree. We detected significant conflict between data partitions, especially between morphology and molecules. We found significant rate heterogeneity between species for both the 18S rRNA and combined datasets, introducing the possibility of LBA. As a test case, we showed that LBA probably affected the position of Spelaeogriphacea in the combined molecular evidence analysis. We also demonstrated that several clades, including the previously reported and surprising clade of Amphipoda plus Spelaeogriphacea, are 'supported' by zero length branches. Furthermore we showed that different sets of taxa have the greatest impact upon the relationships within molecular versus morphological trees.</p> <p>Conclusion</p> <p>Rate heterogeneity and conflict between data partitions mean that existing molecular and morphological evidence is unable to resolve a well-supported eumalacostracan phylogeny. We believe that it will be necessary to look beyond the most commonly utilized sources of data (nuclear ribosomal and mitochondrial sequences) to obtain a robust tree in the future.</p
    • โ€ฆ
    corecore