1,176 research outputs found

    phyloXML: XML for evolutionary biology and comparative genomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree branches have lengths and oftentimes support values. Gene trees used in comparative genomics or phylogenomics are usually annotated with taxonomic information, genome-related data, such as gene names and functional annotations, as well as events such as gene duplications, speciations, or exon shufflings, combined with information related to the evolutionary tree itself. The data standards currently used for evolutionary trees have limited capacities to incorporate such annotations of different data types.</p> <p>Results</p> <p>We developed a XML language, named phyloXML, for describing evolutionary trees, as well as various associated data items. PhyloXML provides elements for commonly used items, such as branch lengths, support values, taxonomic names, and gene names and identifiers. By using "property" elements, phyloXML can be adapted to novel and unforeseen use cases. We also developed various software tools for reading, writing, conversion, and visualization of phyloXML formatted data.</p> <p>Conclusion</p> <p>PhyloXML is an XML language defined by a complete schema in XSD that allows storing and exchanging the structures of evolutionary trees as well as associated data. More information about phyloXML itself, the XSD schema, as well as tools implementing and supporting phyloXML, is available at <url>http://www.phyloxml.org</url>.</p

    Genetic, serological and biochemical characterization of Leishmania tropica from foci in northern Palestine and discovery of zymodeme MON-307

    Get PDF
    Background Many cases of cutaneous leishmaniasis (CL) have been recorded in the Jenin District based on their clinical appearance. Here, their parasites have been characterized in depth. Methods Leishmanial parasites isolated from 12 human cases of CL from the Jenin District were cultured as promastigotes, whose DNA was extracted. The ITS1 sequence and the 7SL RNA gene were analysed as was the kinetoplast minicircle DNA (kDNA) sequence. Excreted factor (EF) serotyping and multilocus enzyme electrophoresis (MLEE) were also applied. Results This extensive characterization identified the strains as Leishmania tropica of two very distinct sub-types that parallel the two sub-groups discerned by multilocus microsatellite typing (MLMT) done previously. A high degree of congruity was displayed among the results generated by the different analytical methods that had examined various cellular components and exposed intra-specific heterogeneity among the 12 strains. Three of the ten strains subjected to MLEE constituted a new zymodeme, zymodeme MON-307, and seven belonged to the known zymodeme MON-137. Ten of the 15 enzymes in the profile of zymodeme MON-307 displayed different electrophoretic mobilities compared with the enzyme profile of the zymodeme MON-137. The closest profile to that of zymodeme MON-307 was that of the zymodeme MON-76 known from Syria. Strains of the zymodeme MON-307 were EF sub-serotype A2 and those of the zymodeme MON-137 were either A9 or A9B4. The sub-serotype B4 component appears, so far, to be unique to some strains of L. tropica of zymodeme MON-137. Strains of the zymodeme MON-137 displayed a distinctive fragment of 417 bp that was absent in those of zymodeme MON-307 when their kDNA was digested with the endonuclease RsaI. kDNA-RFLP after digestion with the endonuclease MboI facilitated a further level of differentiation that partially coincided with the geographical distribution of the human cases from which the strains came. Conclusions The Palestinian strains that were assigned to different genetic groups differed in their MLEE profiles and their EF types. A new zymodeme, zymodeme MON-307 was discovered that seems to be unique to the northern part of the Palestinian West Bank. What seemed to be a straight forward classical situation of L. tropica causing anthroponotic CL in the Jenin District might be a more complex situation, owing to the presence of two separate sub-types of L. tropica that, possibly, indicates two separate transmission cycles involving two separate types of phlebotomine sand fly vector

    Cirsium species show disparity in patterns of genetic variation at their range-edge, despite similar patterns of reproduction and isolation

    Get PDF
    Genetic variation was assessed across the UK geographical range of Cirsium acaule and Cirsium heterophyllum. A decline in genetic diversity and increase in population divergence approaching the range edge of these species was predicted based on parallel declines in population density and seed production reported seperately. Patterns were compared with UK populations of the widespread Cirsium arvense.Populations were sampled along a latitudinal transect in the UK and genetic variation assessed using microsatellite markers. Cirsium acaule shows strong isolation by distance, a significant decline in diversity and an increase in divergence among range-edge populations. Geographical structure is also evident in C. arvense, whereas no such patterns are seen in C.heterophyllum. There is a major disparity between patterns of genetic variation in C. acaule and C. heterophyllum despite very similar patterns in seed production and population isolation in these species. This suggests it may be misleading to make assumptions about the geographical structure of genetic variation within species based solely on the present-day reproduction and distribution of populations

    Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

    Get PDF
    One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

    MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families

    Get PDF
    Background: Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses. Methodology: Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets. Conclusions: Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to mor

    Structural and Functional Evolution of the Trace Amine-Associated Receptors TAAR3, TAAR4 and TAAR5 in Primates

    Get PDF
    The family of trace amine-associated receptors (TAAR) comprises 9 mammalian TAAR subtypes, with intact gene and pseudogene numbers differing considerably even between closely related species. To date the best characterized subtype is TAAR1, which activates the Gs protein/adenylyl cyclase pathway upon stimulation by trace amines and psychoactive substances like MDMA or LSD. Recently, chemosensory function involving recognition of volatile amines was proposed for murine TAAR3, TAAR4 and TAAR5. Humans can smell volatile amines despite carrying open reading frame (ORF) disruptions in TAAR3 and TAAR4. Therefore, we set out to study the functional and structural evolution of these genes with a special focus on primates. Functional analyses showed that ligands activating the murine TAAR3, TAAR4 and TAAR5 do not activate intact primate and mammalian orthologs, although they evolve under purifying selection and hence must be functional. We also find little evidence for positive selection that could explain the functional differences between mouse and other mammals. Our findings rather suggest that the previously identified volatile amine TAAR3–5 agonists reflect the high agonist promiscuity of TAAR, and that the ligands driving purifying selection of these TAAR in mouse and other mammals still await discovery. More generally, our study points out how analyses in an evolutionary context can help to interpret functional data generated in single species

    Speciation Along Environmental Gradients

    Get PDF
    Traditional discussions of speciation are based on geographical patterns of species ranges. In allopatric speciation, long-term geographical isolation generates reproductively isolated and spatially segregated descendant species. In the absence of geographical barriers, diversification is hindered by gene flow. Yet a growing body of phylogenetic and experimental data suggests that closely related species often occur in sympatry or have adjacent ranges in regions over which environmental changes are gradual and do not prevent gene flow. Theory has identified a variety of evolutionary processes that can result in speciation under sympatric conditions, with some recent advances concentrating on the phenomenon of evolutionary branching. Here we establish a link between geographical patterns and ecological processes of speciation by studying evolutionary branching in spatially structured populations. We show that along an environmental gradient, evolutionary branching can occur much more easily than in non-spatial models. This facilitation is most pronounced for gradients of intermediate slope. Moreover, spatial evolutionary branching readily generates patterns of spatial segregation and abutment between the emerging species. Our results highlight the importance of local processes of adaptive divergence for geographical patterns of speciation, and caution against pitfalls of inferring past speciation processes from present biogeographical patterns

    Phylometrics: a pipeline for inferring phylogenetic trees from a sequence relationship network perspective

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative sequence analysis of the 16S rRNA gene is frequently used to characterize the microbial diversity of environmental samples. However, sequence similarities do not always imply functional or evolutionary relatedness due to many factors, including unequal rates of change and convergence. Thus, relying on top BLASTN hits for phylogenetic studies may misrepresent the diversity of these constituents. Furthermore, attempts to circumvent this issue by including a large number of BLASTN hits per sequence in one tree to explore their relatedness presents other problems. For instance, the multiple sequence alignment will be poor and computationally costly if not relying on manual alignment, and it may be difficult to derive meaningful relationships from the resulting tree. Analyzing sequence relationship networks within collective BLASTN results, however, reveal sequences that are closely related despite low rank.</p> <p>Results</p> <p>We have developed a web application, Phylometrics, that relies on networks of collective BLASTN results (rather than single BLASTN hits) to facilitate the process of building phylogenetic trees in an automated, high-throughput fashion while offering novel tools to find sequences that are of significant phylogenetic interest with minimal human involvement. The application, which can be installed locally in a laboratory or hosted remotely, utilizes a simple wizard-style format to guide the user through the pipeline without necessitating a background in programming. Furthermore, Phylometrics implements an independent job queuing system that enables users to continue to use the system while jobs are run with little or no degradation in performance. </p> <p>Conclusions</p> <p>Phylometrics provides a novel data mining method to screen supplied DNA sequences and to identify sequences that are of significant phylogenetic interest using powerful analytical tools. Sequences that are identified as being similar to a number of supplied sequences may provide key insights into their functional or evolutionary relatedness. Users require the same basic computer skills as for navigating most internet applications.</p

    Morphology of the earliest reconstructable tetrapod Parmastega aelidae.

    Get PDF
    The known diversity of tetrapods of the Devonian period has increased markedly in recent decades, but their fossil record consists mostly of tantalizing fragments1-15. The framework for interpreting the morphology and palaeobiology of Devonian tetrapods is dominated by the near complete fossils of Ichthyostega and Acanthostega; the less complete, but partly reconstructable, Ventastega and Tulerpeton have supporting roles2,4,16-34. All four of these genera date to the late Famennian age (about 365-359 million years ago)-they are 10 million years younger than the earliest known tetrapod fragments5,10, and nearly 30 million years younger than the oldest known tetrapod footprints35. Here we describe Parmastega aelidae gen. et sp. nov., a tetrapod from Russia dated to the earliest Famennian age (about 372 million years ago), represented by three-dimensional material that enables the reconstruction of the skull and shoulder girdle. The raised orbits, lateral line canals and weakly ossified postcranial skeleton of P. aelidae suggest a largely aquatic, surface-cruising animal. In Bayesian and parsimony-based phylogenetic analyses, the majority of trees place Parmastega as a sister group to all other tetrapods

    The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies

    Get PDF
    Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of β-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results urge caution in the design and interpretation of analyses using pyrosequencing data
    corecore