20 research outputs found

    Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of <it>a priori </it>rating of parameter space and therefore more objective.</p> <p>Results</p> <p>ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict.</p> <p>Conclusions</p> <p>Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.</p

    Testing a Short Nuclear Marker for Inferring Staphylinid Beetle Diversity in an African Tropical Rain Forest

    Get PDF
    The use of DNA based methods for assessing biodiversity has become increasingly common during the last years. Especially in speciose biomes as tropical rain forests and/or in hyperdiverse or understudied taxa they may efficiently complement morphological approaches. The most successful molecular approach in this field is DNA barcoding based on cytochrome c oxidase I (COI) marker, but other markers are used as well. Whereas most studies aim at identifying or describing species, there are only few attempts to use DNA markers for inventorying all animal species found in environmental samples to describe variations of biodiversity patterns.In this study, an analysis of the nuclear D3 region of the 28S rRNA gene to delimit species-like units is compared to results based on distinction of morphospecies. Data derived from both approaches are used to assess diversity and composition of staphylinid beetle communities of a Guineo-Congolian rain forest in Kenya. Beetles were collected with a standardized sampling design across six transects in primary and secondary forests using pitfall traps. Sequences could be obtained of 99% of all individuals. In total, 76 molecular operational taxonomic units (MOTUs) were found in contrast to 70 discernible morphospecies. Despite this difference both approaches revealed highly similar biodiversity patterns, with species richness being equal in primary and secondary forests, but with divergent species communities in different habitats. The D3-MOTU approach proved to be an efficient tool for biodiversity analyses.Our data illustrate that the use of MOTUs as a proxy for species can provide an alternative to morphospecies identification for the analysis of changes in community structure of hyperdiverse insect taxa. The efficient amplification of the D3-marker and the ability of the D3-MOTUs to reveal similar biodiversity patterns as analyses of morphospecies recommend its use in future molecular studies on biodiversity

    Can quartet analyses combining maximum likelihood estimation and Hennigian logic overcome long branch attraction in phylogenomic sequence data?

    No full text
    Systematic biases such as long branch attraction can mislead commonly relied upon model-based (i.e. maximum likelihood and Bayesian) phylogenetic methods when, as is usually the case with empirical data, there is model misspecification. We present PhyQuart, a new method for evaluating the three possible binary trees for any quartet of taxa. PhyQuart was developed through a process of reciprocal illumination between a priori considerations and the results of extensive simulations. It is based on identification of site-patterns that can be considered to support a particular quartet tree taking into account the Hennigian distinction between apomorphic and plesiomorphic similarity, and employing corrections to the raw observed frequencies of site-patterns that exploit expectations from maximum likelihood estimation. We demonstrate through extensive simulation experiments that, whereas maximum likeilihood estimation performs well in many cases, it can be outperformed by PhyQuart in cases where it fails due to extreme branch length asymmetries producing long-branch attraction artefacts where there is only very minor model misspecification

    AliGROOVE – visualization of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support

    Get PDF
    BACKGROUND: Masking of multiple sequence alignment blocks has become a powerful method to enhance the tree-likeness of the underlying data. However, existing masking approaches are insensitive to heterogeneous sequence divergence which can mislead tree reconstructions. We present AliGROOVE, a new method based on a sliding window and a Monte Carlo resampling approach, that visualizes heterogeneous sequence divergence or alignment ambiguity related to single taxa or subsets of taxa within a multiple sequence alignment and tags suspicious branches on a given tree. RESULTS: We used simulated multiple sequence alignments to show that the extent of alignment ambiguity in pairwise sequence comparison is correlated with the frequency of misplaced taxa in tree reconstructions. The approach implemented in AliGROOVE allows to detect nodes within a tree that are supported despite the absence of phylogenetic signal in the underlying multiple sequence alignment. We show that AliGROOVE equally well detects heterogeneous sequence divergence in a case study based on an empirical data set of mitochondrial DNA sequences of chelicerates. CONCLUSIONS: The AliGROOVE approach has the potential to identify single taxa or subsets of taxa which show predominantly randomized sequence similarity in comparison with other taxa in a multiple sequence alignment. It further allows to evaluate the reliability of node support in a novel way

    Exploring the Leaf Beetle Fauna (Coleoptera: Chrysomelidae) of an Ecuadorian Mountain Forest Using DNA Barcoding

    No full text
    Background Tropical mountain forests are hotspots of biodiversity hosting a huge but little known diversity of insects that is endangered by habitat destruction and climate change. Therefore, rapid assessment approaches of insect diversity are urgently needed to complement slower traditional taxonomic approaches. We empirically compare different DNA-based species delimitation approaches for a rapid biodiversity assessment of hyperdiverse leaf beetle assemblages along an elevational gradient in southern Ecuador and explore their effect on species richness estimates. Methodology/Principal Findings Based on a COI barcode data set of 674 leaf beetle specimens (Coleoptera: Chrysomelidae) of 266 morphospecies from three sample sites in the Podocarpus National Park, we employed statistical parsimony analysis, distance-based clustering, GMYC- and PTP-modelling to delimit species-like units and compared them to morphology-based (parataxonomic) species identifications. The four different approaches for DNA-based species delimitation revealed highly similar numbers of molecular operational taxonomic units (MOTUs) (n = 284–289). Estimated total species richness was considerably higher than the sampled amount, 414 for morphospecies (Chao2) and 469–481 for the different MOTU types. Assemblages at different elevational levels (1000 vs. 2000 m) had similar species numbers but a very distinct species composition for all delimitation methods. Most species were found only at one elevation while this turnover pattern was even more pronounced for DNA-based delimitation. Conclusions/Significance Given the high congruence of DNA-based delimitation results, probably due to the sampling structure, our study suggests that when applied to species communities on a regionally limited level with high amount of rare species (i.e. ~50% singletons), the choice of species delimitation method can be of minor relevance for assessing species numbers and turnover in tropical insect communities. Therefore, DNA-based species delimitation is confirmed as a valuable tool for evaluating biodiversity of hyperdiverse insect communities, especially when exact taxonomic identifications are missing

    Flowchart of the <i>PhyQuart</i> algorithm.

    No full text
    <p>Simplified flowchart showing a) each of the three possible quartet relationships for a set of 4 sequences (<i>q</i><sub>1</sub>, <i>q</i><sub>2</sub>, <i>q</i><sub>3</sub>), b) the site-pattern classification of observed (<i>N</i><sub><i>obs</i></sub>) symmetric () and asymmetric () support (), c) the determination of plesiomorphic (old) split-supporting site-patterns given two different polarities of character transformation along the internal branch of each possible quartet tree, and , and d) estimation of expected convergent split-supporting site-patterns () supporting quartet <i>q</i><sub>1</sub> in ML split pattern estimations using branch length and model optimization on constraint topologies of the other two possible quartet relationships (<i>q</i><sub>2</sub>, <i>q</i><sub>3</sub>).</p

    Quartet reconstruction success given stepwise elongation of two terminal branches if sequence lengths equal 250 kbp.

    No full text
    <p>Plots visualizing the reconstruction success of <i>PhyQuart</i> (blue) and ML (green) given stepwise elongation of two terminal branches (BL2, x-axis) and a fixed, very short internal branch length (BL1 = {0.01}) for 100 (250 kbp long) data replicates (y-axis). The plots present the summarized reconstruction success for (a) Felsenstein-, and (b) Farris-topologies of given <i>α</i> = {0.1, 0.3, 0.5, 0.7, 1.0, 2.0} and an invariable site proportion (I) of 0.3. A detailed overview of all simulation results of this setup is given as supplementary information <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0183393#pone.0183393.s001" target="_blank">S1 Fig</a>.</p

    Defined model parameters for data simulation (INDELible) and ML analyses used in <i>PhyQuart</i> and PhyML.

    No full text
    <p>Defined model parameters for data simulation (INDELible) and ML analyses used in <i>PhyQuart</i> and PhyML.</p

    Quartet reconstruction success given stepwise elongation of one or three terminal branches if sequence lengths equal 250 kbp.

    No full text
    <p>Reconstruction success of <i>PhyQuart</i> (blue) and ML (green) for different rate heterogeneities under different lengths of a) a single long terminal branch (BL2, x-axis) and b) three long terminal branches (BL2, x-axis), given 100 data replicates (y-axis) of 250 kbp length and a fixed alternative internal branch length of BL1 = {0.01}, summarized for <i>α</i> = {0.1, 0.3, 0.5, 0.7, 1.0, 2.0}. A detailed overview of all simulation results of both setups is given as supplementary information <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0183393#pone.0183393.s004" target="_blank">S4</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0183393#pone.0183393.s005" target="_blank">S5</a> Figs.</p
    corecore