1,021 research outputs found

    General functions to transform associate data to host data, and their use in phylogenetic inference from sequences with intra-individual variability

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Amongst the most commonly used molecular markers for plant phylogenetic studies are the nuclear ribosomal internal transcribed spacers (ITS). Intra-individual variability of these multicopy regions is a very common phenomenon in plants, the causes of which are debated in literature. Phylogenetic reconstruction under these conditions is inherently difficult. Our approach is to consider this problem as a special case of the general biological question of how to infer the characteristics of hosts (represented here by plant individuals) from features of their associates (represented by cloned sequences here).</p> <p>Results</p> <p>Six general transformation functions are introduced, covering the transformation of associate characters to discrete and continuous host characters, and the transformation of associate distances to host distances. A pure distance-based framework is established in which these transformation functions are applied to ITS sequences collected from the angiosperm genera <it>Acer</it>, <it>Fagus </it>and <it>Zelkova</it>. The formulae are also applied to allelic data of three different loci obtained from <it>Rosa </it>spp. The functions are validated by (1) phylogeny-independent measures of treelikeness; (2) correlation with independent host characters; (3) visualization using splits graphs and comparison with published data on the test organisms. The results agree well with these three measures and the datasets examined as well as with the theoretical predictions and previous results in the literature. High-quality distance matrices are obtained with four of the six transformation formulae. We demonstrate that one of them represents a generalization of the Sørensen coefficient, which is widely applied in ecology.</p> <p>Conclusion</p> <p>Because of their generality, the transformation functions may be applied to a wide range of biological problems that are interpretable in terms of hosts and associates. Regarding cloned sequences, the formulae have a high potential to accurately reflect evolutionary relationships within angiosperm genera, and to identify hybrids and ancestral taxa. These results corroborate earlier ones which showed that treelikeness measures are a valuable tool in comparative studies of biological distance functions.</p

    Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling

    Get PDF
    The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis

    A Nuclear Ribosomal DNA Phylogeny of Acer Inferred with Maximum Likelihood, Splits Graphs, and Motif Analysis of 606 Sequences

    Get PDF
    The multi-copy internal transcribed spacer (ITS) region of nuclear ribosomal DNA is widely used to infer phylogenetic relationships among closely related taxa. Here we use maximum likelihood (ML) and splits graph analyses to extract phylogenetic information from ~ 600 mostly cloned ITS sequences, representing 81 species and subspecies of Acer, and both species of its sister Dipteronia. Additional analyses compared sequence motifs in Acer and several hundred Anacardiaceae, Burseraceae, Meliaceae, Rutaceae, and Sapindaceae ITS sequences in GenBank. We also assessed the effects of using smaller data sets of consensus sequences with ambiguity coding (accounting for within-species variation) instead of the full (partly redundant) original sequences. Neighbor-nets and bipartition networks were used to visualize conflict among character state patterns. Species clusters observed in the trees and networks largely agree with morphology-based classifications; of de Jong’s (1994) 16 sections, nine are supported in neighbor-net and bipartition networks, and ten by sequence motifs and the ML tree; of his 19 series, 14 are supported in networks, motifs, and the ML tree. Most nodes had higher bootstrap support with matrices of 105 or 40 consensus sequences than with the original matrix. Within-taxon ITS divergence did not differ between diploid and polyploid Acer, and there was little evidence of differentiated parental ITS haplotypes, suggesting that concerted evolution in Acer acts rapidly

    A Clustering Optimization Strategy for Molecular Taxonomy Applied to Planktonic Foraminifera SSU rDNA

    Get PDF
    Identifying species is challenging in the case of organisms for which primarily molecular data are available. Even if morphological features are available, molecular taxonomy is often necessary to revise taxonomic concepts and to analyze environmental DNA sequences. However, clustering approaches to delineate molecular operational taxonomic units often rely on arbitrary parameter choices. Also, distance calculation is difficult for highly alignment-ambiguous sequences. Here, we applied a recently described clustering optimization method to highly divergent planktonic foraminifera SSU rDNA sequences. We determined the distance function and the clustering setting that result in the highest agreement with morphological reference data. Alignment-free distance calculation, when adapted to the use with partly non-homologous sequences caused by distinct primer pairs, outperformed multiple sequence alignment. Clustering optimization offers new perspectives for the barcoding of species diversity and for environmental sequencing. It bridges the gap between traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both genetic divergence and given species concepts

    Using the Multiple Analysis Approach to Reconstruct Phylogenetic Relationships among Planktonic Foraminifera from Highly Divergent and Length-polymorphic SSU rDNA Sequences

    Get PDF
    The high sequence divergence within the small subunit ribosomal RNA gene (SSU rDNA) of foraminifera makes it difficult to establish the homology of individual nucleotides across taxa. Alignment-based approaches so far relied on time-consuming manual alignments and discarded up to 50% of the sequenced nucleotides prior to phylogenetic inference. Here, we investigate the potential of the multiple analysis approach to infer a molecular phylogeny of all modern planktonic foraminiferal taxa by using a matrix of 146 new and 153 previously published SSU rDNA sequences. Our multiple analysis approach is based on eleven different automated alignments, analysed separately under the maximum likelihood criterion. The high degree of congruence between the phylogenies derived from our novel approach, traditional manually homologized culled alignments and the fossil record indicates that poorly resolved nucleotide homology does not represent the most significant obstacle when exploring the phylogenetic structure of the SSU rDNA in planktonic foraminifera. We show that approaches designed to extract phylogenetically valuable signals from complete sequences show more promise to resolve the backbone of the planktonic foraminifer tree than attempts to establish strictly homologous base calls in a manual alignment

    Fagaceae pollen from the early Conozoic of West Greenland:revisiting Engler`s and Chaney`s Arcto-Tertiary hypotheses

    Get PDF
    In this paper we document Fagaceae pollen from the Eocene of western Greenland. The pollen record suggests a remarkable diversity of the family in the early Cenozoic of Greenland. Extinct Fagaceae pollen types include Eotrigonobalanus, which extends at least back to the Paleocene, and two ancestral pollen types with affinities to the Eurasian Quercus Group Ilex and the western North American Quercus Group Protobalanus. In addition, modern lineages of Fagaceae are unambiguously represented by pollen of Fagus, Quercus Group Lobatae/Quercus, and three Castaneoideae pollen types. These findings corroborate earlier findings from Axel Heiberg Island that Fagaceae were a dominant element at high latitudes during the early Cenozoic. Comparison with coeval or older mid-latitude records of modern lineages of Fagaceae shows that modern lineages found in western Greenland and Axel Heiberg likely originated at lower latitudes. Further examples comprise (possibly) Acer, Aesculus, Alnus, Ulmus, and others. Thus, before fossils belonging to modern northern temperate lineages will have been recovered from older (early Eocene, Paleocene) strata from high latitudes, Engler’s hypothesis of an Arctic origin of the modern temperate woody flora of Eurasia, termed ‘Arcto-Tertiary Element’, and later modification by R. W. Chaney and H. D. Mai (‘Arcto-Tertiary Geoflora’) needs to be modified

    High-resolution assessment of air quality in urban areas—a business model perspective

    Get PDF
    The increasing availability of low-cost air quality sensors has led to novel sensing approaches. Distributed networks of low-cost sensors, together with data fusion and analytics, have enabled unprecedented, spatiotemporal resolution when observing the urban atmosphere. Several projects have demonstrated the potential of different approaches for high-resolution measurement networks ranging from static, low-cost sensor networks over vehicular and airborne sensing to crowdsourced measurements as well as ranging from a research-based operation to citizen science. Yet, sustaining the operation of such low-cost air quality sensor networks remains challenging because of the lack of regulatory support and the lack of an organizational framework linking these measurements to the official air quality network. This paper discusses the logical inclusion of lower-cost air quality sensors into the existing air quality network via a dynamic field calibration process, the resulting sustainable business models, and how this expansion can be self-funded

    Phylogenetic relationships in the southern African genus Drosanthemum (Ruschioideae, Aizoaceae)

    Get PDF
    Background. Drosanthemum, the only genus of the tribe Drosanthemeae, is widespread over the Greater Cape Floristic Region in southern Africa. With 114 recognized species, Drosanthemum together with the highly succulent and species-rich tribe Ruschieae constitute the 'core ruschioids' in Aizoaceae. Within Drosanthemum, nine subgenera have been described based on flower and fruit morphology. Their phylogenetic relationships, however, have not yet been investigated, hampering understanding of monophyletic entities and patterns of geographic distribution. Methods. Using chloroplast and nuclear DNA sequence data, we performed network- and tree-based phylogenetic analyses of 73 species of Drosanthemum with multiple accessions for widespread species. A well-curated, geo-referenced occurrence data set comprising the 134 genetically analysed and 863 further accessions was used to describe the distributional ranges of intrageneric lineages and the genus as a whole. Results. Phylogenetic inference supports nine clades within Drosanthemum, seven of which group in two major clades, while the remaining two show ambiguous affinities. The nine clades are generally congruent to previously described subgenera within Drosanthemum, with exceptions such as (pseudo-) cryptic species. In-depth analyses of sequence patterns in each gene region were used to reveal phylogenetic affinities inside the retrieved clades in more detail. We observe a complex distribution pattern including widespread, species-rich clades expanding into arid habitats of the interior (subgenera Drosanthemum p.p., Vespertina, Xamera) that are genetically and morphologically diverse. In contrast, less species-rich, genetically less divergent, and morphologically unique lineages are restricted to the central Cape region and more mesic conditions (Decidua, Necopina, Ossicula, Quastea, Quadrata, Speciosa). Our results suggest that the main lineages arose from an initial rapid radiation, with subsequent diversification in some clades.Raw data, code, analysis output, and species occurrence The zip file contains a ReadMe file and 4 folders: 1_main_data_and_results (the files used to produce the figures in the main text); 2_ML_phylogenetics (raw data, code, and analysis output of ML phylogenetic analyses); 3_MJ_networks (raw data [SNP/sequence motive recoded DNA alignment matrices], and output of median-joining network analyses)
    corecore