1,819 research outputs found

    ๋ฐ”์ด์˜ค์ธํฌ๋งคํ‹ฑ์Šค ํ”„๋กœ๊ทธ๋žจ์„ ์ด์šฉํ•œ ์œ ์ „์ž ๋งˆ์ปค ์„ ๋ณ„ ๋ฐ ๊ณ„ํ†ต์ˆ˜ ์˜ค๋ฅ˜ ํ‰๊ฐ€ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2021.8. ์†ํ˜„์„.์ง€์†์ ์œผ๋กœ ์‚ฐ์ถœ๋˜๋Š” ์—„์ฒญ๋‚œ ์–‘์˜ ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ๋Š” ์œ ๊ธฐ์ฒด ์‚ฌ์ด์˜ ์ง„ํ™”์  ์—ญ์‚ฌ์™€ ๊ณ„ํ†ตํ•™์  ๊ด€๊ณ„(phylogenetic relationship)๋ฅผ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ์ด์ œ ๊ณ„ํ†ต์ˆ˜ ๊ตฌ์ถ•์€ ๊ฑฐ์˜ ๋ชจ๋“  ์ƒ๋ฌผํ•™ ์—ฐ๊ตฌ์—์„œ ์ˆ˜ํ–‰๋˜๋Š” ๊ณผ์ •์˜ ํ•˜๋‚˜๊ฐ€ ๋˜์—ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๊ณ„ํ†ต์ •๋ณดํ•™(phyloinformatics)์€ ๊ณ„ํ†ต์ˆ˜ ์ƒ์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์ง„ํ™”์  ๋ชจ๋ธ ๊ฐœ๋ฐœ๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์  ๋˜๋Š” ๋ฐฉ๋ฒ•๋ก ์  ์—ฐ๊ตฌ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๋ฐœ์ „๋˜์–ด ์™”๋‹ค. ํ˜„์žฌ์˜ ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์€ ์„œ์—ด ๋ฐ์ดํ„ฐ, ์ฆ‰ ์œ ์ „์  ๋งˆ์ปค๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ƒ์„ฑํ•จ์œผ๋กœ์จ ์‹ค์ œ์— ๊ฐ€๊นŒ์šด ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์œ ์ „์  ๋งˆ์ปค๋ฅผ ๋น„๋กฏํ•œ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ  ๋”ฐ๋ผ์˜ค๋Š” ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์˜ ์ •ํ™•์„ฑ์— ๋Œ€ํ•œ ์˜๋ฌธ์ด ์ ์ฐจ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ฃจ์–ด ์ง€๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด์„œ ๊ณ„ํ†ต์ˆ˜์˜ ์ •ํ™•์„ฑ ๋ฐ ์‹ ๋ขฐ์„ฑ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๋‹ค์ˆ˜ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์ด๋‹ค. ๋ถ„์ž ์‹œ์Šคํ…œํ•™ ๊ด€์ ์—์„œ ๊ณ„ํ†ต์ˆ˜์— ๋Œ€ํ•œ ์ •ํ™•์„ฑ ํ‰๊ฐ€๋Š” ๋‘ ๊ฐ€์ง€ ๊ฐˆ๋ž˜๋กœ ๋‚˜๋ˆ„์–ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ํ•˜๋‚˜๋Š” ์ง„ํ™” ์กฐ๊ฑด, ๋ถ„์ž๋ฐ์ดํ„ฐ์˜ ์–‘๊ณผ ๊ฐ™์€ ํŠน์ • ํ™˜๊ฒฝ ์•„๋ž˜์—์„œ ๊ณ„ํ†ต ๋ถ„์„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์ด๊ณ , ๋˜ ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ํŠน์ • ๊ณ„ํ†ต์ˆ˜๋ฅผ ์–ผ๋งˆ๋‚˜ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š”์ง€์— ์ง‘์ค‘ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ์…‹์˜ ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ ์‹ ๋ขฐํ•  ๋งŒํ•œ ๊ณ„ํ†ต์ˆ˜๋ฅผ ํš๋“ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•œ ํ›„, ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ์˜ ์ ์ ˆ์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜๋‹ค. ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ์ทจ๊ธ‰ํ•˜๋Š” ์ตœ๊ทผ ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์—์„œ ํ™•๋ฅ ๋ก ์  ์˜ค๋ฅ˜์˜ ๊ฐ€๋Šฅ์„ฑ์€ ๋‚ฎ์•„์กŒ์ง€๋งŒ, ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ฐ€๋Šฅ์„ฑ์€ ์˜คํžˆ๋ ค ๋†’์•„์กŒ์œผ๋ฏ€๋กœ, ๊ณ„ํ†ต์ˆ˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ ๋ฐ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„ํ†ต ๋ถ„์„ ๊ฒฐ๊ณผ ํ›„์— ๋ฐ์ดํ„ฐ์…‹์ด ๊ฐ€์ง€๋Š” ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ทผ์›์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•œ ๊ณผ์ •์ด ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด์— ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ ๊ณ„ํ†ต์ˆ˜์˜ ์‹ ๋ขฐ๋„ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด APSE (Assessment Program for Systematic Error, tentative)๋ผ๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. APSE๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋ถ„๋ฅ˜๊ตฐ ํŠน์ด์  ์ƒ๋Œ€์  ๊ตฌ์„ฑ ๋นˆ๋„ ๋ณ€์ด(RCFV)์™€ ๋Œ€์นญ์  ์™œ๊ณก๊ฐ’(skew)์„ ์‚ฐ์ถœํ•˜์—ฌ ์—ผ๊ธฐ์„œ์—ด์˜ ๊ตฌ์„ฑ์  ํŽธํ–ฅ์„ฑ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์–ป๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์—ฐ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์œ ์ „์  ์ด์งˆ์„ฑ(heterogeneity) ๋ฐ ์œ ์ „์  ๋ณ€์ด ํŽธํ–ฅ์„ฑ(mutational bias)์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค์–‘ํ•œ ์—ผ๊ธฐ ๊ทธ๋ฃน์˜ ๋นˆ๋„, ๋ณ€์ด์— ์˜ํ•œ ๋‹ค์ˆ˜ ์น˜ํ™˜์„ ์˜๋ฏธํ•˜๋Š” ํฌํ™”(saturation)์™€ ๊ณต์œ  ๊ฒฐ์ธก ๋ฐ์ดํ„ฐ(shared missing data) ๋ณ€์ˆ˜๋ฅผ ํ†ตํ•ด ์‹œ์Šคํ…œ ์˜ค๋ฅ˜๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ๋Š” ํŽธํ–ฅ์„ฑ ์ •๋ณด๋“ค์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋˜ํ•œ, ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์œ ์ „์ž ๋งˆ์ปค ์‚ฌ์ด์˜ ๋ชจ์ˆœ๋˜๋Š” ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ถœ๋ ฅํ•˜๊ณ  ์žˆ๋Š”, ํŠน์ด์  ์˜ˆ์‹œ(Terebelliformia, Daphniid, Glires)๋ฅผ APSE์— ์ ์šฉํ•˜์—ฌ ๋งˆ์ปค ๋ฐ์ดํ„ฐ์…‹์˜ ์‹œ์Šคํ…œ ์˜ค๋ฅ˜ ํ‰๊ฐ€์™€ ๊ทธ์— ๋”ฐ๋ผ ์„ ๋ณ„๋œ ๋งˆ์ปค ๊ณ„ํ†ต์ˆ˜์˜ ์ •ํ™•์„ฑ ์ถ”๋ก ์— ๋Œ€ํ•œ ๋ถ„์„์ด ์ œ๋Œ€๋กœ ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋”ฐ๋ผ์„œ ํ–ฅํ›„ APSE๋Š” ์‹œ์Šคํ…œํ•™์  ๊ด€์ ์—์„œ ๋ฐ์ดํ„ฐ ํ€„๋ฆฌํ‹ฐ์— ์ง‘์ค‘ํ•˜์—ฌ ์ƒ์„ฑ๋œ ๊ณ„ํ†ต์ˆ˜๊ฐ€ ๋ณด๋‹ค ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ด๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์šฉ์ž์˜ ๋ฐ์ดํ„ฐ์™€ ๊ณ„ํ†ต์ˆ˜ ์‚ฌ์ด์˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ์—ญํ• ์„ ํ•  ๊ฒƒ์ด๊ณ , ์œ ์ „์  ๋งˆ์ปค์— ๋”ฐ๋ผ ์˜คํ•ด์˜ ์†Œ์ง€๊ฐ€ ์žˆ๋Š” ๊ณ„ํ†ต์ˆ˜๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ์„ ๋•Œ, ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ทผ์›์— ๋Œ€ํ•œ ์ฒ ์ €ํ•œ ๋ถ„์„๊ณผ ํ•ด๋‹น ์˜ค๋ฅ˜์˜ ์˜ํ–ฅ์„ ๋ฐ›์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ„ํ†ต์ˆ˜์— ์ฃผ๋Š” ํšจ๊ณผ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ์ผ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ ๊ธฐ๋Œ€ํ•œ๋‹ค.The steadily increasing volume of biological data with decisive phylogenetic relationship provides unparalleled opportunities in bioinformatics. Phylogenetics based on a large amount of datasets handling an evolutionary history and assigning the placement of taxa in a phylogeny establishes the tree of life. Constructing a phylogeny involving a phylogenetic analysis is implemented in most branches of biology and emphasizing the evolutionary history elucidates the phylogenetical background as a prerequisite interpreting a specific biological system, which is a biologically indispensable process. Due to the advent of computing and sequencing techniques as the phylogenetic approach, phyloinformatics has rapidly advanced at the technical and methodological levels along with phylogenetic reconstruction algorithm and evolutionary models. Unlike the classic approach using morphological data, modern phylogenetic analysis reconstructs a phylogeny using genetic information following the inference of phylogenetic tree from molecular data. Therefore, phylogeneticists have naturally dealt with questions concerning the accuracy of phylogenetic estimation and carried out studies on the reliability of phylogenies. In terms of molecular systematics, the concerns regarding the assessment of phylogenetic accuracy considering specific evolutionary conditions and the amount of molecular data implemented can now be divided into two types: how phylogenetic method works and how reliable it is under certain circumstances. Moreover, in terms of data quality, assessment for suitability of nuclear marker is required before the phylogenetic inference is performed for confident phylogeny. Recently, the probability of stochastic errors in phylogenetic estimation dealing with a large-scale datasets has decreased, while the probability of systematic errors has increased. Thus, before the implementation of phylogenetic reconstruction, the assessment of sources of systematic errors is indispensable for the improvement and estimation of phylogenetic accuracy. Assessment Program for Systematic Error (APSE) developed by this study will plays a key role in assessment between user datasets and phylogenies for improving the results of phylogenetic reconstruction in systematics and will be able to implement an analysis of the effect on data bearing systematic errors in a phylogeny after the misleading phylogenetic results are produced. This study with APSE will serve as the inference of phylogenetic accuracy and the assessment of systematic errors using an unresolved example showing the contradicting topologies between different gene markers in the same diversity group. Furthermore, by selectively grouping the properties of the existing systematic biases provided by the APSE, it proceeds in the direction of proposing a new protocol that can provide the best gene marker among candidate markers for a specific taxon.I. INTRODUCTION 1 1.1 Background of research 1 1.2 Necessity of research 20 1.3 Research objectives 22 II. MATERIALS AND METHODS 30 2.1 Datasets definition and data collection 30 2.2 Data processing and bioinformatics software used 33 2.3 Phylogenetic reconstruction and accuracy assessment 36 2.4 Software development environment and allowable data 37 2.5 Assessment of the systematic errors 38 III. RESULTS 45 3.1 Phylogenetic analysis results for incongruence between gene markers 45 3.2 Data-quality analysis using systematic errors 49 IV. DISCUSSION 79 4.1 Significance and implications of study 79 4.2 Application to bioinformatics research 80 4.3 Improvement and achievement 81 V. CONCLUSION AND SUMMARY 83 5.1 Conclusion 83 5.2 Summary 84 BIBLIOGRAPHY 87 ABSTRACT (KOREAN) 96์„

    Resolving tricky nodes in the tree of life through amino acid recoding

    Full text link
    Genomic data allowed a detailed resolution of the Tree of Life, but ''tricky nodes'' such as the root of the animals remain unresolved. Genome-scale datasets are heterogeneous as genes and species are exposed to different pressures, and this can negatively impacts phylogenetic accuracy. We use simulated genomic- scale datasets and show that recoding amino acid data improves accuracy when the model does not account for the compositional heterogeneity of the amino acid alignment. We apply our findings to three datasets addressing the root of the animal tree, where the debate centers on whether sponges (Porifera) or comb jellies (Ctenophora) represent the sister of all other animals. We show that results from empirical data follow predictions from simulations and suggest that, at the least in phylogenies inferred from amino acid sequences, a placement of the ctenophores as sister to all the other animals is best explained as a tree reconstruction artifact

    Sources of Signal in 62 Protein-Coding Nuclear Genes for Higher-Level Phylogenetics of Arthropods

    Get PDF
    BACKGROUND: This study aims to investigate the strength of various sources of phylogenetic information that led to recent seemingly robust conclusions about higher-level arthropod phylogeny and to assess the role of excluding or downweighting synonymous change for arriving at those conclusions. METHODOLOGY/PRINCIPAL FINDINGS: The current study analyzes DNA sequences from 68 gene segments of 62 distinct protein-coding nuclear genes for 80 species. Gene segments analyzed individually support numerous nodes recovered in combined-gene analyses, but few of the higher-level nodes of greatest current interest. However, neither is there support for conflicting alternatives to these higher-level nodes. Gene segments with higher rates of nonsynonymous change tend to be more informative overall, but those with lower rates tend to provide stronger support for deeper nodes. Higher-level nodes with bootstrap values in the 80% - 99% range for the complete data matrix are markedly more sensitive to substantial drops in their bootstrap percentages after character subsampling than those with 100% bootstrap, suggesting that these nodes are likely not to have been strongly supported with many fewer data than in the full matrix. Data set partitioning of total data by (mostly) synonymous and (mostly) nonsynonymous change improves overall node support, but the result remains much inferior to analysis of (unpartitioned) nonsynonymous change alone. Clusters of genes with similar nonsynonymous rate properties (e.g., faster vs. slower) show some distinct patterns of node support but few conflicts. Synonymous change is shown to contribute little, if any, phylogenetic signal to the support of higher-level nodes, but it does contribute nonphylogenetic signal, probably through its underlying heterogeneous nucleotide composition. Analysis of seemingly conservative indels does not prove useful. CONCLUSIONS: Generating a robust molecular higher-level phylogeny of Arthropoda is currently possible with large amounts of data and an exclusive reliance on nonsynonymous change

    Systematic Error in Seed Plant Phylogenomics

    Get PDF
    Resolving the closest relatives of Gnetales has been an enigmatic problem in seed plant phylogeny. The problem is known to be difficult because of the extent of divergence between this diverse group of gymnosperms and their closest phylogenetic relatives. Here, we investigate the evolutionary properties of conifer chloroplast DNA sequences. To improve taxon sampling of Cupressophyta (non-Pinaceae conifers), we report sequences from three new chloroplast (cp) genomes of Southern Hemisphere conifers. We have applied a site pattern sorting criterion to study compositional heterogeneity, heterotachy, and the fit of conifer chloroplast genome sequences to a general time reversible + G substitution model. We show that non-time reversible properties of aligned sequence positions in the chloroplast genomes of Gnetales mislead phylogenetic reconstruction of these seed plants. When 2,250 of the most varied sites in our concatenated alignment are excluded, phylogenetic analyses favor a close evolutionary relationship between the Gnetales and Pinaceaeโ€”the Gnepine hypothesis. Our analytical protocol provides a useful approach for evaluating the robustness of phylogenomic inferences. Our findings highlight the importance of goodness of fit between substitution model and data for understanding seed plant phylogeny

    The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of data that one can retrieve from public sequence repositories is virtually impossible to tame without dedicated software that automates processes. Here we present a novel bioinformatics pipeline for downloading, formatting, filtering and analyzing public sequence data deposited in GenBank. It combines some well-established programs with numerous newly developed software tools (available at <url>http://software.zfmk.de/</url>).</p> <p>Results</p> <p>We used the bioinformatics pipeline to investigate the phylogeny of the megadiverse insect order Hymenoptera (sawflies, bees, wasps and ants) by retrieving and processing more than 120,000 sequences and by selecting subsets under the criteria of compositional homogeneity and defined levels of density and overlap. Tree reconstruction was done with a partitioned maximum likelihood analysis from a supermatrix with more than 80,000 sites and more than 1,100 species. In the inferred tree, consistent with previous studies, "Symphyta" is paraphyletic. Within Apocrita, our analysis suggests a topology of Stephanoidea + (Ichneumonoidea + (Proctotrupomorpha + (Evanioidea + Aculeata))). Despite the huge amount of data, we identified several persistent problems in the Hymenoptera tree. Data coverage is still extremely low, and additional data have to be collected to reliably infer the phylogeny of Hymenoptera.</p> <p>Conclusions</p> <p>While we applied our bioinformatics pipeline to Hymenoptera, we designed the approach to be as general as possible. With this pipeline, it is possible to produce phylogenetic trees for any taxonomic group and to monitor new data and tree robustness in a taxon of interest. It therefore has great potential to meet the challenges of the phylogenomic era and to deepen our understanding of the tree of life.</p

    A new method for identifying site-specific evolutionary rates and its applications.

    Get PDF
    In this thesis, I discuss each stage in the development of a new method for identifying site specific evolutionary rates, from conception of the idea, through the implementation to its application to data. TIGER, or tree independent generation of evolutionary rates, is based largely around the works of LeQuesne (1989), Wilkinson (1998) and Pisani (2004) and the premise that sites in a multi-state character matrix could be scored based on the level of agreement it displays with the other sites. In these earlier studies, however, agreement was measured in binary manner: sites were either compatible with each other or they are not. TIGER allows various degrees of agreement to occur between two sites, allowing it to pick up more subtle signals in the data. After implementing the method into a software program, it could be applied to data. Using a combination of simulated and empirical datasets, TIGER was shown to produce desirable results. In particular, removal of sites identified by TIGER was shown to improve phylogenetic reconstruction of deeply diverging lineages and of taxa displaying compositional attraction. Additionally, TIGER was applied to a gene content matrix in order to identify HGT signals and integrated into the analysis of a current phylogenetic problem, the origin of the mitochondria. Although it is widely accepted that eukaryotes have a chimeric genome, the specific โ€œparentโ€ of the mitochondria is, as of yet, unclear. Previous studies have failed to reach agreement regarding this issue for a number of reasons. Exploration of the signals using TIGER and heterogeneous modelling reveal that multiple signals and compositional heterogeneity are among the biggest problems with datasets containing both mitochondrial and a-proteobacterial sequences

    Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.

    Get PDF
    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins

    What is the phylogenetic signal limit from mitogenomes? The reconciliation between mitochondrial and nuclear data in the Insecta class phylogeny

    Get PDF
    Background: Efforts to solve higher-level evolutionary relationships within the class Insecta by using mitochondrial genomic data are hindered due to fast sequence evolution of several groups, most notably Hymenoptera, Strepsiptera, Phthiraptera, Hemiptera and Thysanoptera. Accelerated rates of substitution on their sequences have been shown to have negative consequences in phylogenetic inference. In this study, we tested several methodological approaches to recover phylogenetic signal from whole mitochondrial genomes. As a model, we used two classical problems in insect phylogenetics: The relationships within Paraneoptera and within Holometabola. Moreover, we assessed the mitochondrial phylogenetic signal limits in the deeper Eumetabola dataset, and we studied the contribution of individual genes. Results: Long-branch attraction (LBA) artefacts were detected in all the datasets. Methods using Bayesian inference outperformed maximum likelihood approaches, and LBA was avoided in Paraneoptera and Holometabola when using protein sequences and the site-heterogeneous mixture model CAT. The better performance of this method was evidenced by resulting topologies matching generally accepted hypotheses based on nuclear and/or morphological data, and was confirmed by cross-validation and simulation analyses. Using the CAT model, the order Strepsiptera was recovered as sister to Coleoptera for the first time using mitochondrial sequences, in agreement with recent results based on large nuclear and morphological datasets. Also the Hymenoptera-Mecopterida association was obtained, leaving Coleoptera and Strepsiptera as the basal groups of the holometabolan insects, which coincides with one of the two main competing hypotheses. For the Paraneroptera, the currently accepted non-monophyly of Homoptera was documented as a phylogenetic novelty for mitochondrial data. However, results were not satisfactory when exploring the entire Eumetabola, revealing the limits of the phylogenetic signal that can be extracted from Insecta mitogenomes. Based on the combined use of the five best topology-performing genes we obtained comparable results to whole mitogenomes, highlighting the important role of data quality. Conclusion: We show for the first time that mitogenomic data agrees with nuclear and morphological data for several of the most controversial insect evolutionary relationships, adding a new independent source of evidence to study relationships among insect orders. We propose that deeper divergences cannot be inferred with the current available methods due to sequence saturation and compositional bias inconsistencies. Our exploratory analysis indicates that the CAT model is the best dealing with LBA and it could be useful for other groups and datasets with similar phylogenetic difficulties
    • โ€ฆ
    corecore