1,452 research outputs found

    Testing robustness of relative complexity measure method constructing robust phylogenetic trees for Galanthus L. Using the relative complexity measure

    Get PDF
    Background: Most phylogeny analysis methods based on molecular sequences use multiple alignment where the quality of the alignment, which is dependent on the alignment parameters, determines the accuracy of the resulting trees. Different parameter combinations chosen for the multiple alignment may result in different phylogenies. A new non-alignment based approach, Relative Complexity Measure (RCM), has been introduced to tackle this problem and proven to work in fungi and mitochondrial DNA. Result: In this work, we present an application of the RCM method to reconstruct robust phylogenetic trees using sequence data for genus Galanthus obtained from different regions in Turkey. Phylogenies have been analyzed using nuclear and chloroplast DNA sequences. Results showed that, the tree obtained from nuclear ribosomal RNA gene sequences was more robust, while the tree obtained from the chloroplast DNA showed a higher degree of variation. Conclusions: Phylogenies generated by Relative Complexity Measure were found to be robust and results of RCM were more reliable than the compared techniques. Particularly, to overcome MSA-based problems, RCM seems to be a reasonable way and a good alternative to MSA-based phylogenetic analysis. We believe our method will become a mainstream phylogeny construction method especially for the highly variable sequence families where the accuracy of the MSA heavily depends on the alignment parameters

    Potential pitfalls of modelling ribosomal RNA data in phylogenetic tree reconstruction: Evidence from case studies in the Metazoa

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Failure to account for covariation patterns in helical regions of ribosomal RNA (rRNA) genes has the potential to misdirect the estimation of the phylogenetic signal of the data. Furthermore, the extremes of length variation among taxa, combined with regional substitution rate variation can mislead the alignment of rRNA sequences and thus distort subsequent tree reconstructions. However, recent developments in phylogenetic methodology now allow a comprehensive integration of secondary structures in alignment and tree reconstruction analyses based on rRNA sequences, which has been shown to correct some of these problems. Here, we explore the potentials of RNA substitution models and the interactions of specific model setups with the inherent pattern of covariation in rRNA stems and substitution rate variation among loop regions.</p> <p>Results</p> <p>We found an explicit impact of RNA substitution models on tree reconstruction analyses. The application of specific RNA models in tree reconstructions is hampered by interaction between the appropriate modelling of covarying sites in stem regions, and excessive homoplasy in some loop regions. RNA models often failed to recover reasonable trees when single-stranded regions are excessively homoplastic, because these regions contribute a greater proportion of the data when covarying sites are essentially downweighted. In this context, the RNA6A model outperformed all other models, including the more parametrized RNA7 and RNA16 models.</p> <p>Conclusions</p> <p>Our results depict a trade-off between increased accuracy in estimation of interdependencies in helical regions with the risk of magnifying positions lacking phylogenetic signal. We can therefore conclude that caution is warranted when applying rRNA covariation models, and suggest that loop regions be independently screened for phylogenetic signal, and eliminated when they are indistinguishable from random noise. In addition to covariation and homoplasy, other factors, like non-stationarity of substitution rates and base compositional heterogeneity, can disrupt the signal of ribosomal RNA data. All these factors dictate sophisticated estimation of evolutionary pattern in rRNA data, just as other molecular data require similarly complicated (but different) corrections.</p

    Homology Assessment in Molecular Phylogenetics : Evaluation, Improvement, and Influence of Data Quality on Tree Reconstruction

    Get PDF
    Considering the final goal of every phylogenetic analysis, the reconstruction of taxon relationships from underlying data, little attention has been paid to the role of alignment accuracy and its impact on tree reconstruction. Alignment masking approaches are methods which detect and remove erroneously aligned sections before tree reconstruction. I describe the effect of two masking methods on alignment quality and tree reconstruction. While masking methods are commonly efficient in detecting ambiguously aligned sequence blocks, all methods more or less lack the ability to detect heterogeneous sequence divergence within sequence alignments. This is a main disadvantage of masking approaches, because undetected heterogeneous sequence divergence can result in a strong bias in tree reconstructions. I give a detailed description of a new developed algorithm and the possibility of tagging branches as an indirect estimation of reliability of a subset of possible splits guided by a topology. The performance of the new algorithm was tested on simulated and empirical data. Considering the tree reconstruction process, the first task is the choice of an appropriate tree reconstruction method. Examining theoretical studies and comparative tests Maximum Likelihood turns out as the first choice for phylogenetic tree reconstructions. I show that the success of Maximum Likelihood depends not only on the degree of alignment quality, but also on the relation of branch length differences of underlying topologies. I tested the robustness of Maximum Likelihood towards different classes of long branch effects in multiple taxon topologies by using simulated fixed data sets under two different 11-taxon trees and a broad range of different branch length conditions with sequence alignments of different length. Some of the most important scripts and pipelines which have been written for the accomplishment of this thesis are also described

    Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Published molecular phylogenies are usually based on data whose quality has not been explored prior to tree inference. This leads to errors because trees obtained with conventional methods suppress conflicting evidence, and because support values may be high even if there is no distinct phylogenetic signal. Tools that allow an a priori examination of data quality are rarely applied.</p> <p>Results</p> <p>Using data from published molecular analyses on the phylogeny of crustaceans it is shown that tree topologies and popular support values do not show existing differences in data quality. To visualize variations in signal distinctness, we use network analyses based on split decomposition and split support spectra. Both methods show the same differences in data quality and the same clade-supporting patterns. Both methods are useful to discover long-branch effects.</p> <p>We discern three classes of long branch effects. Class I effects consist of attraction of terminal taxa caused by symplesiomorphies, which results in a false monophyly of paraphyletic groups. Addition of carefully selected taxa can fix this effect. Class II effects are caused by drastic signal erosion. Long branches affected by this phenomenon usually slip down the tree to form false clades that in reality are polyphyletic. To recover the correct phylogeny, more conservative genes must be used. Class III effects consist of attraction due to accumulated chance similarities or convergent character states. This sort of noise can be reduced by selecting less variable portions of the data set, avoiding biases, and adding slower genes.</p> <p>Conclusion</p> <p>To increase confidence in molecular phylogenies an exploratory analysis of the signal to noise ratio can be conducted with split decomposition methods. If long-branch effects are detected, it is necessary to discern between three classes of effects to find the best approach for an improvement of the raw data.</p

    Informational Gene Phylogenies Do Not Support a Fourth Domain of Life for Nucleocytoplasmic Large DNA Viruses

    Get PDF
    Mimivirus is a nucleocytoplasmic large DNA virus (NCLDV) with a genome size (1.2 Mb) and coding capacity ( 1000 genes) comparable to that of some cellular organisms. Unlike other viruses, Mimivirus and its NCLDV relatives encode homologs of broadly conserved informational genes found in Bacteria, Archaea, and Eukaryotes, raising the possibility that they could be placed on the tree of life. A recent phylogenetic analysis of these genes showed the NCLDVs emerging as a monophyletic group branching between Eukaryotes and Archaea. These trees were interpreted as evidence for an independent โ€œfourth domainโ€ of life that may have contributed DNA processing genes to the ancestral eukaryote. However, the analysis of ancient evolutionary events is challenging, and tree reconstruction is susceptible to bias resulting from non-phylogenetic signals in the data. These include compositional heterogeneity and homoplasy, which can lead to the spurious grouping of compositionally-similar or fast-evolving sequences. Here, we show that these informational gene alignments contain both significant compositional heterogeneity and homoplasy, which were not adequately modelled in the original analysis. When we use more realistic evolutionary models that better fit the data, the resulting trees are unable to reject a simple null hypothesis in which these informational genes, like many other NCLDV genes, were acquired by horizontal transfer from eukaryotic hosts. Our results suggest that a fourth domain is not required to explain the available sequence data

    ๋ฐ”์ด์˜ค์ธํฌ๋งคํ‹ฑ์Šค ํ”„๋กœ๊ทธ๋žจ์„ ์ด์šฉํ•œ ์œ ์ „์ž ๋งˆ์ปค ์„ ๋ณ„ ๋ฐ ๊ณ„ํ†ต์ˆ˜ ์˜ค๋ฅ˜ ํ‰๊ฐ€ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2021.8. ์†ํ˜„์„.์ง€์†์ ์œผ๋กœ ์‚ฐ์ถœ๋˜๋Š” ์—„์ฒญ๋‚œ ์–‘์˜ ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ๋Š” ์œ ๊ธฐ์ฒด ์‚ฌ์ด์˜ ์ง„ํ™”์  ์—ญ์‚ฌ์™€ ๊ณ„ํ†ตํ•™์  ๊ด€๊ณ„(phylogenetic relationship)๋ฅผ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ์ด์ œ ๊ณ„ํ†ต์ˆ˜ ๊ตฌ์ถ•์€ ๊ฑฐ์˜ ๋ชจ๋“  ์ƒ๋ฌผํ•™ ์—ฐ๊ตฌ์—์„œ ์ˆ˜ํ–‰๋˜๋Š” ๊ณผ์ •์˜ ํ•˜๋‚˜๊ฐ€ ๋˜์—ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๊ณ„ํ†ต์ •๋ณดํ•™(phyloinformatics)์€ ๊ณ„ํ†ต์ˆ˜ ์ƒ์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ์ง„ํ™”์  ๋ชจ๋ธ ๊ฐœ๋ฐœ๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์  ๋˜๋Š” ๋ฐฉ๋ฒ•๋ก ์  ์—ฐ๊ตฌ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๋ฐœ์ „๋˜์–ด ์™”๋‹ค. ํ˜„์žฌ์˜ ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์€ ์„œ์—ด ๋ฐ์ดํ„ฐ, ์ฆ‰ ์œ ์ „์  ๋งˆ์ปค๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ƒ์„ฑํ•จ์œผ๋กœ์จ ์‹ค์ œ์— ๊ฐ€๊นŒ์šด ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์œ ์ „์  ๋งˆ์ปค๋ฅผ ๋น„๋กฏํ•œ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ  ๋”ฐ๋ผ์˜ค๋Š” ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์˜ ์ •ํ™•์„ฑ์— ๋Œ€ํ•œ ์˜๋ฌธ์ด ์ ์ฐจ ์ค‘์š”ํ•˜๊ฒŒ ๋‹ค๋ฃจ์–ด ์ง€๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด์„œ ๊ณ„ํ†ต์ˆ˜์˜ ์ •ํ™•์„ฑ ๋ฐ ์‹ ๋ขฐ์„ฑ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๋‹ค์ˆ˜ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์ด๋‹ค. ๋ถ„์ž ์‹œ์Šคํ…œํ•™ ๊ด€์ ์—์„œ ๊ณ„ํ†ต์ˆ˜์— ๋Œ€ํ•œ ์ •ํ™•์„ฑ ํ‰๊ฐ€๋Š” ๋‘ ๊ฐ€์ง€ ๊ฐˆ๋ž˜๋กœ ๋‚˜๋ˆ„์–ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ํ•˜๋‚˜๋Š” ์ง„ํ™” ์กฐ๊ฑด, ๋ถ„์ž๋ฐ์ดํ„ฐ์˜ ์–‘๊ณผ ๊ฐ™์€ ํŠน์ • ํ™˜๊ฒฝ ์•„๋ž˜์—์„œ ๊ณ„ํ†ต ๋ถ„์„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์ด๊ณ , ๋˜ ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ํŠน์ • ๊ณ„ํ†ต์ˆ˜๋ฅผ ์–ผ๋งˆ๋‚˜ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š”์ง€์— ์ง‘์ค‘ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ์…‹์˜ ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ ์‹ ๋ขฐํ•  ๋งŒํ•œ ๊ณ„ํ†ต์ˆ˜๋ฅผ ํš๋“ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•œ ํ›„, ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ์˜ ์ ์ ˆ์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜๋‹ค. ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ์ทจ๊ธ‰ํ•˜๋Š” ์ตœ๊ทผ ๊ณ„ํ†ต์ˆ˜ ๋ถ„์„์—์„œ ํ™•๋ฅ ๋ก ์  ์˜ค๋ฅ˜์˜ ๊ฐ€๋Šฅ์„ฑ์€ ๋‚ฎ์•„์กŒ์ง€๋งŒ, ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ฐ€๋Šฅ์„ฑ์€ ์˜คํžˆ๋ ค ๋†’์•„์กŒ์œผ๋ฏ€๋กœ, ๊ณ„ํ†ต์ˆ˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ ๋ฐ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„ํ†ต ๋ถ„์„ ๊ฒฐ๊ณผ ํ›„์— ๋ฐ์ดํ„ฐ์…‹์ด ๊ฐ€์ง€๋Š” ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ทผ์›์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•œ ๊ณผ์ •์ด ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด์— ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ€„๋ฆฌํ‹ฐ ๊ด€์ ์—์„œ ๊ณ„ํ†ต์ˆ˜์˜ ์‹ ๋ขฐ๋„ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด APSE (Assessment Program for Systematic Error, tentative)๋ผ๋Š” ํ”„๋กœ๊ทธ๋žจ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. APSE๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋ถ„๋ฅ˜๊ตฐ ํŠน์ด์  ์ƒ๋Œ€์  ๊ตฌ์„ฑ ๋นˆ๋„ ๋ณ€์ด(RCFV)์™€ ๋Œ€์นญ์  ์™œ๊ณก๊ฐ’(skew)์„ ์‚ฐ์ถœํ•˜์—ฌ ์—ผ๊ธฐ์„œ์—ด์˜ ๊ตฌ์„ฑ์  ํŽธํ–ฅ์„ฑ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์–ป๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์—ฐ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์œ ์ „์  ์ด์งˆ์„ฑ(heterogeneity) ๋ฐ ์œ ์ „์  ๋ณ€์ด ํŽธํ–ฅ์„ฑ(mutational bias)์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‹ค์–‘ํ•œ ์—ผ๊ธฐ ๊ทธ๋ฃน์˜ ๋นˆ๋„, ๋ณ€์ด์— ์˜ํ•œ ๋‹ค์ˆ˜ ์น˜ํ™˜์„ ์˜๋ฏธํ•˜๋Š” ํฌํ™”(saturation)์™€ ๊ณต์œ  ๊ฒฐ์ธก ๋ฐ์ดํ„ฐ(shared missing data) ๋ณ€์ˆ˜๋ฅผ ํ†ตํ•ด ์‹œ์Šคํ…œ ์˜ค๋ฅ˜๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ๋Š” ํŽธํ–ฅ์„ฑ ์ •๋ณด๋“ค์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋˜ํ•œ, ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์œ ์ „์ž ๋งˆ์ปค ์‚ฌ์ด์˜ ๋ชจ์ˆœ๋˜๋Š” ๊ณ„ํ†ต์ˆ˜๋ฅผ ์ถœ๋ ฅํ•˜๊ณ  ์žˆ๋Š”, ํŠน์ด์  ์˜ˆ์‹œ(Terebelliformia, Daphniid, Glires)๋ฅผ APSE์— ์ ์šฉํ•˜์—ฌ ๋งˆ์ปค ๋ฐ์ดํ„ฐ์…‹์˜ ์‹œ์Šคํ…œ ์˜ค๋ฅ˜ ํ‰๊ฐ€์™€ ๊ทธ์— ๋”ฐ๋ผ ์„ ๋ณ„๋œ ๋งˆ์ปค ๊ณ„ํ†ต์ˆ˜์˜ ์ •ํ™•์„ฑ ์ถ”๋ก ์— ๋Œ€ํ•œ ๋ถ„์„์ด ์ œ๋Œ€๋กœ ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋”ฐ๋ผ์„œ ํ–ฅํ›„ APSE๋Š” ์‹œ์Šคํ…œํ•™์  ๊ด€์ ์—์„œ ๋ฐ์ดํ„ฐ ํ€„๋ฆฌํ‹ฐ์— ์ง‘์ค‘ํ•˜์—ฌ ์ƒ์„ฑ๋œ ๊ณ„ํ†ต์ˆ˜๊ฐ€ ๋ณด๋‹ค ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ด๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์šฉ์ž์˜ ๋ฐ์ดํ„ฐ์™€ ๊ณ„ํ†ต์ˆ˜ ์‚ฌ์ด์˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ํ•˜๋Š” ์—ญํ• ์„ ํ•  ๊ฒƒ์ด๊ณ , ์œ ์ „์  ๋งˆ์ปค์— ๋”ฐ๋ผ ์˜คํ•ด์˜ ์†Œ์ง€๊ฐ€ ์žˆ๋Š” ๊ณ„ํ†ต์ˆ˜๊ฐ€ ์ถœ๋ ฅ๋˜์—ˆ์„ ๋•Œ, ์‹œ์Šคํ…œ ์˜ค๋ฅ˜์˜ ๊ทผ์›์— ๋Œ€ํ•œ ์ฒ ์ €ํ•œ ๋ถ„์„๊ณผ ํ•ด๋‹น ์˜ค๋ฅ˜์˜ ์˜ํ–ฅ์„ ๋ฐ›์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณ„ํ†ต์ˆ˜์— ์ฃผ๋Š” ํšจ๊ณผ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ์ผ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ ๊ธฐ๋Œ€ํ•œ๋‹ค.The steadily increasing volume of biological data with decisive phylogenetic relationship provides unparalleled opportunities in bioinformatics. Phylogenetics based on a large amount of datasets handling an evolutionary history and assigning the placement of taxa in a phylogeny establishes the tree of life. Constructing a phylogeny involving a phylogenetic analysis is implemented in most branches of biology and emphasizing the evolutionary history elucidates the phylogenetical background as a prerequisite interpreting a specific biological system, which is a biologically indispensable process. Due to the advent of computing and sequencing techniques as the phylogenetic approach, phyloinformatics has rapidly advanced at the technical and methodological levels along with phylogenetic reconstruction algorithm and evolutionary models. Unlike the classic approach using morphological data, modern phylogenetic analysis reconstructs a phylogeny using genetic information following the inference of phylogenetic tree from molecular data. Therefore, phylogeneticists have naturally dealt with questions concerning the accuracy of phylogenetic estimation and carried out studies on the reliability of phylogenies. In terms of molecular systematics, the concerns regarding the assessment of phylogenetic accuracy considering specific evolutionary conditions and the amount of molecular data implemented can now be divided into two types: how phylogenetic method works and how reliable it is under certain circumstances. Moreover, in terms of data quality, assessment for suitability of nuclear marker is required before the phylogenetic inference is performed for confident phylogeny. Recently, the probability of stochastic errors in phylogenetic estimation dealing with a large-scale datasets has decreased, while the probability of systematic errors has increased. Thus, before the implementation of phylogenetic reconstruction, the assessment of sources of systematic errors is indispensable for the improvement and estimation of phylogenetic accuracy. Assessment Program for Systematic Error (APSE) developed by this study will plays a key role in assessment between user datasets and phylogenies for improving the results of phylogenetic reconstruction in systematics and will be able to implement an analysis of the effect on data bearing systematic errors in a phylogeny after the misleading phylogenetic results are produced. This study with APSE will serve as the inference of phylogenetic accuracy and the assessment of systematic errors using an unresolved example showing the contradicting topologies between different gene markers in the same diversity group. Furthermore, by selectively grouping the properties of the existing systematic biases provided by the APSE, it proceeds in the direction of proposing a new protocol that can provide the best gene marker among candidate markers for a specific taxon.I. INTRODUCTION 1 1.1 Background of research 1 1.2 Necessity of research 20 1.3 Research objectives 22 II. MATERIALS AND METHODS 30 2.1 Datasets definition and data collection 30 2.2 Data processing and bioinformatics software used 33 2.3 Phylogenetic reconstruction and accuracy assessment 36 2.4 Software development environment and allowable data 37 2.5 Assessment of the systematic errors 38 III. RESULTS 45 3.1 Phylogenetic analysis results for incongruence between gene markers 45 3.2 Data-quality analysis using systematic errors 49 IV. DISCUSSION 79 4.1 Significance and implications of study 79 4.2 Application to bioinformatics research 80 4.3 Improvement and achievement 81 V. CONCLUSION AND SUMMARY 83 5.1 Conclusion 83 5.2 Summary 84 BIBLIOGRAPHY 87 ABSTRACT (KOREAN) 96์„

    Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking.</p> <p>Results</p> <p>This is the first study to counter this deficiency. We inspected the accuracy and robustness of phylogenetics with individual secondary structures by simulation experiments for artificial tree topologies with up to 18 taxa and for divergency levels in the range of typical phylogenetic studies. We chose the internal transcribed spacer 2 of the ribosomal cistron as an exemplary marker region. Simulation integrated the coevolution process of sequences with secondary structures. Additionally, the phylogenetic power of marker size duplication was investigated and compared with sequence and sequence-structure reconstruction methods. The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness.</p> <p>Conclusions</p> <p>Individual secondary structures of ribosomal RNA sequences provide a valuable gain of information content that is useful for phylogenetics. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended. Other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony may equally profit from secondary structure inclusion.</p> <p>Reviewers</p> <p>This article was reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin.</p> <p>Open peer review</p> <p>Reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. For the full reviews, please go to the Reviewers' comments section.</p

    Testing robustness of relative complexity measure method constructing robust phylogenetic trees for \u3ci\u3eGalanthus\u3c/i\u3e L. Using the relative complexity measure

    Get PDF
    Background: Most phylogeny analysis methods based on molecular sequences use multiple alignment where the quality of the alignment, which is dependent on the alignment parameters, determines the accuracy of the resulting trees. Different parameter combinations chosen for the multiple alignment may result in different phylogenies. A new non-alignment based approach, Relative Complexity Measure (RCM), has been introduced to tackle this problem and proven to work in fungi and mitochondrial DNA. Result: In this work, we present an application of the RCM method to reconstruct robust phylogenetic trees using sequence data for genus Galanthus obtained from different regions in Turkey. Phylogenies have been analyzed using nuclear and chloroplast DNA sequences. Results showed that, the tree obtained from nuclear ribosomal RNA gene sequences was more robust, while the tree obtained from the chloroplast DNA showed a higher degree of variation. Conclusions: Phylogenies generated by Relative Complexity Measure were found to be robust and results of RCM were more reliable than the compared techniques. Particularly, to overcome MSA-based problems, RCM seems to be a reasonable way and a good alternative to MSA-based phylogenetic analysis. We believe our method will become a mainstream phylogeny construction method especially for the highly variable sequence families where the accuracy of the MSA heavily depends on the alignment parameters
    • โ€ฆ
    corecore