1,188 research outputs found

    Diversification and adaptive sequence evolution of Caenorhabditis lysozymes (Nematoda: Rhabditidae)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Lysozymes are important model enzymes in biomedical research with a ubiquitous taxonomic distribution ranging from phages up to plants and animals. Their main function appears to be defence against pathogens, although some of them have also been implicated in digestion. Whereas most organisms have only few lysozyme genes, nematodes of the genus <it>Caenorhabditis </it>possess a surprisingly large repertoire of up to 15 genes.</p> <p>Results</p> <p>We used phylogenetic inference and sequence analysis tools to assess the evolution of lysozymes from three congeneric nematode species, <it>Caenorhabditis elegans</it>, <it>C. briggsae</it>, and <it>C. remanei</it>. Their lysozymes fall into three distinct clades, one belonging to the invertebrate-type and the other two to the protist-type lysozymes. Their diversification is characterised by (i) ancestral gene duplications preceding species separation followed by maintenance of genes, (ii) ancestral duplications followed by gene loss in some of the species, and (iii) recent duplications after divergence of species. Both ancestral and recent gene duplications are associated in several cases with signatures of adaptive sequence evolution, indicating that diversifying selection contributed to lysozyme differentiation. Current data strongly suggests that genetic diversity translates into functional diversity.</p> <p>Conclusion</p> <p>Gene duplications are a major source of evolutionary innovation. Our analysis provides an evolutionary framework for understanding the diversification of lysozymes through gene duplication and subsequent differentiation. This information is expected to be of major value in future analysis of lysozyme function and in studies of the dynamics of evolution by gene duplication.</p

    Progressive multiple sequence alignment with indel evolution

    Get PDF
    Background: Sequence alignment is crucial in genomics studies. However, optimal multiple sequence alignment (MSA) is NP-hard. Thus, modern MSA methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogeny. Changes between homologous characters are typically modelled by a Markov substitution model. In contrast, the dynamics of indels are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. But the failure to model indel evolution may lead to artificially short alignments due to biased indel placement, inconsistent with phylogenetic relationship. Results: Recently, the classical indel model TKF91 was modified to describe indel evolution on a phylogeny via a Poisson process, termed PIP. PIP allows to compute the joint marginal probability of an MSA and a tree in linear time. We present a new dynamic programming algorithm to align two MSAs –represented by the underlying homology paths– by full maximum likelihood under PIP in polynomial time, and apply it progressively along a guide tree. We have corroborated the correctness of our method by simulation, and compared it with competitive methods on an illustrative real dataset. Conclusions: Our MSA method is the first polynomial time progressive aligner with a rigorous mathematical formulation of indel evolution. The new method infers phylogenetically meaningful gap patterns alternative to the popular PRANK, while producing alignments of similar length. Moreover, the inferred gap patterns agree with what was predicted qualitatively by previous studies. The algorithm is implemented in a standalone C++ program

    The isolation, characterization, and identification of a novel species of bacterium in the Enterobacteriaceae family from Kephart Prong, Great Smoky Mountains National Park

    Get PDF
    The purpose of this study was to examine a single bacterial species isolated from Great Smoky Mountains Nation Park (GSMNP), characterize its growth requirements, and identify it down to the species level. A polyphasic approach that examined phenotypic, genotypic, and phylogenetic characteristics was used. Phenotypic analysis revealed that the isolate is Gram-negative, rod-shaped, non-motile, oxidase negative, catalase positive, and grows in the presence and absence of oxygen. Growth was observed at temperatures ranging from 4ºC to 37ºC, with optimum growth at 30ºC based on visual observation of colony mass. The pH range for growth was pH7-9, with optimum growth at pH9 based on visual observation of colony mass. The isolate can tolerate up to 1% NaCl in the nutrient media. Genotypic analysis utilizing 16S rDNA sequences and whole genome sequencing (WGS) identified the isolate as a member of the order “Enterobacteriales” and the family Enterobacteriaceae. Phylogenetic analysis supported the isolate’s position in both taxa, but did not cluster the isolate with any specific genera. On the basis of phenotypic, genotypic, and phylogenetic properties, the isolate LD2 represents a novel species of a new genus

    Inferring Kangaroo Phylogeny from Incongruent Nuclear and Mitochondrial Genes

    Get PDF
    The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus) and M. (Osphranter), as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus). A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby) into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby) within M. (Osphranter) rather than as expected, with M. (Notamacropus). Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression.This work has been supported by Australian Research Council grants to MJP (DP07745015) and MB (FT0991741). The website for the funder is www.arc.gov.au. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

    Get PDF
    Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment

    Alignment uncertainty, regressive alignment and large scale deployment

    Get PDF
    A multiple sequence alignment (MSA) provides a description of the relationship between biological sequences where columns represent a shared ancestry through an implied set of evolutionary events. The majority of research in the field has focused on improving the accuracy of alignments within the progressive alignment framework and has allowed for powerful inferences including phylogenetic reconstruction, homology modelling and disease prediction. Notwithstanding this, when applied to modern genomics datasets - often comprising tens of thousands of sequences - new challenges arise in the construction of accurate MSA. These issues can be generalised to form three basic problems. Foremost, as the number of sequences increases, progressive alignment methodologies exhibit a dramatic decrease in alignment accuracy. Additionally, for any given dataset many possible MSA solutions exist, a problem which is exacerbated with an increasing number of sequences due to alignment uncertainty. Finally, technical difficulties hamper the deployment of such genomic analysis workflows - especially in a reproducible manner - often presenting a high barrier for even skilled practitioners. This work aims to address this trifecta of problems through a web server for fast homology extension based MSA, two new methods for improved phylogenetic bootstrap supports incorporating alignment uncertainty, a novel alignment procedure that improves large scale alignments termed regressive MSA and finally a workflow framework that enables the deployment of large scale reproducible analyses across clusters and clouds titled Nextflow. Together, this work can be seen to provide both conceptual and technical advances which deliver substantial improvements to existing MSA methods and the resulting inferences.Un alineament de seqßència múltiple (MSA) proporciona una descripció de la relació entre seqßències biològiques on les columnes representen una ascendència compartida a travÊs d'un conjunt implicat d'esdeveniments evolutius. La majoria de la investigació en el camp s'ha centrat a millorar la precisió dels alineaments dins del marc d'alineació progressiva i ha permès inferències poderoses, incloent-hi la reconstrucció filogenètica, el modelatge d'homologia i la predicció de malalties. Malgrat això, quan s'aplica als conjunts de dades de genòmica moderns, que sovint comprenen desenes de milers de seqßències, sorgeixen nous reptes en la construcció d'un MSA precís. Aquests problemes es poden generalitzar per formar tres problemes bàsics. En primer lloc, a mesura que augmenta el nombre de seqßències, les metodologies d'alineació progressiva presenten una disminució espectacular de la precisió de l'alineació. A mÊs, per a un conjunt de dades, existeixen molts MSA com a possibles solucions un problema que s'agreuja amb un nombre creixent de seqßències a causa de la incertesa d'alineació. Finalment, les dificultats tècniques obstaculitzen el desplegament d'aquests fluxos de treball d'anàlisi genòmica, especialment de manera reproduïble, sovint presenten una gran barrera per als professionals fins i tot qualificats. Aquest treball tÊ com a objectiu abordar aquesta trifecta de problemes a travÊs d'un servidor web per a l'extensió ràpida d'homologia basada en MSA, dos nous mètodes per a la millora de l'arrencada filogenètica permeten incorporar incertesa d'alineació, un nou procediment d'alineació que millora els alineaments a gran escala anomenat MSA regressivu i, finalment, un marc de flux de treball permet el desplegament d'anàlisis reproduïbles a gran escala a travÊs de clústers i computació al núvol anomenat Nextflow. En conjunt, es pot veure que aquest treball proporciona tant avanços conceptuals com tècniques que proporcionen millores substancials als mètodes MSA existents i les conseqßències resultants

    Inferring kangaroo phylogeny from incongruent nuclear and mitochondrial genes

    Get PDF
    The marsupial genus Macropus includes three subgenera, the familiar large grazing kangaroos and wallaroos of M. (Macropus) and M. (Osphranter), as well as the smaller mixed grazing/browsing wallabies of M. (Notamacropus). A recent study of five concatenated nuclear genes recommended subsuming the predominantly browsing Wallabia bicolor (swamp wallaby) into Macropus. To further examine this proposal we sequenced partial mitochondrial genomes for kangaroos and wallabies. These sequences strongly favour the morphological placement of W. bicolor as sister to Macropus, although place M. irma (black-gloved wallaby) within M. (Osphranter) rather than as expected, with M. (Notamacropus). Species tree estimation from separately analysed mitochondrial and nuclear genes favours retaining Macropus and Wallabia as separate genera. A simulation study finds that incomplete lineage sorting among nuclear genes is a plausible explanation for incongruence with the mitochondrial placement of W. bicolor, while mitochondrial introgression from a wallaroo into M. irma is the deepest such event identified in marsupials. Similar such coalescent simulations for interpreting gene tree conflicts will increase in both relevance and statistical power as species-level phylogenetics enters the genomic age. Ecological considerations in turn, hint at a role for selection in accelerating the fixation of introgressed or incompletely sorted loci. More generally the inclusion of the mitochondrial sequences substantially enhanced phylogenetic resolution. However, we caution that the evolutionary dynamics that enhance mitochondria as speciation indicators in the presence of incomplete lineage sorting may also render them especially susceptible to introgression

    Tracking the evolution of alternatively spliced exons within the Dscam family

    Get PDF
    BACKGROUND: The Dscam gene in the fruit fly, Drosophila melanogaster, contains twenty-four exons, four of which are composed of tandem arrays that each undergo mutually exclusive alternative splicing (4, 6, 9 and 17), potentially generating 38,016 protein isoforms. This degree of transcript diversity has not been found in mammalian homologs of Dscam. We examined the molecular evolution of exons within this gene family to locate the point of divergence for this alternative splicing pattern. RESULTS: Using the fruit fly Dscam exons 4, 6, 9 and 17 as seed sequences, we iteratively searched sixteen genomes for homologs, and then performed phylogenetic analyses of the resulting sequences to examine their evolutionary history. We found homologs in the nematode, arthropod and vertebrate genomes, including homologs in several vertebrates where Dscam had not been previously annotated. Among these, only the arthropods contain homologs arranged in tandem arrays indicative of mutually exclusive splicing. We found no homologs to these exons within the Arabidopsis, yeast, tunicate or sea urchin genomes but homologs to several constitutive exons from fly Dscam were present within tunicate and sea urchin. Comparing the rate of turnover within the tandem arrays of the insect taxa (fruit fly, mosquito and honeybee), we found the variants within exons 4 and 17 are well conserved in number and spatial arrangement despite 248–283 million years of divergence. In contrast, the variants within exons 6 and 9 have undergone considerable turnover since these taxa diverged, as indicated by deeply branching taxon-specific lineages. CONCLUSION: Our results suggest that at least one Dscam exon array may be an ancient duplication that predates the divergence of deuterostomes from protostomes but that there is no evidence for the presence of arrays in the common ancestor of vertebrates. The different patterns of conservation and turnover among the Dscam exon arrays provide a striking example of how a gene can evolve in a modular fashion rather than as a single unit

    A new lysozyme from the eastern oyster, Crassostrea virginica, and a possible evolutionary pathway for i-type lysozymes in bivalves from host defense to digestion

    Get PDF
    Background. Lysozymes are enzymes that lyse bacterial cell walls, an activity widely used for host defense but also modified in some instances for digestion. The biochemical and evolutionary changes between these different functional forms has been well-studied in the c-type lysozymes of vertebrates, but less so in the i-type lysozymes prevalent in most invertebrate animals. Some bivalve molluscs possess both defensive and digestive lysozymes. Results. We report a third lysozyme from the oyster Crassostrea virginica, cv-lysozyme 3. The chemical properties of cv-lysozyme 3 (including molecular weight, isoelectric point, basic amino acid residue number, and predicted protease cutting sites) suggest it represents a transitional form between lysozymes used for digestion and immunity. The cv-lysozyme 3 protein inhibited the growth of bacteria (consistent with a defensive function), but semi-quantitative RT-PCR suggested the gene was expressed mainly in digestive glands. Purified cv-lysozyme 3 expressed maximum muramidase activity within a range of pH (7.0 and 8.0) and ionic strength (I = 0.005-0.01) unfavorable for either cv-lysozyme 1 or cv-lysozyme 2 activities. The topology of a phylogenetic analysis of cv-lysozyme 3 cDNA (full length 663 bp, encoding an open reading frame of 187 amino acids) is also consistent with a transitional condition, as cv-lysozyme 3 falls at the base of a monophyletic clade of bivalve lysozymes identified from digestive glands. Rates of nonsynonymous substitution are significantly high at the base of this clade, consistent with an episode of positive selection associated with the functional transition from defense to digestion. Conclusion. The pattern of molecular evolution accompanying the shift from defensive to digestive function in the i-type lysozymes of bivalves parallels those seen for c-type lysozymes in mammals and suggests that the lysozyme paralogs that enhance the range of physiological conditions for lysozyme activity may provide stepping stones between defensive and digestive forms. Š 2010 Xue et al; licensee BioMed Central Ltd

    Bayesian statistical approach for protein residue-residue contact prediction

    Get PDF
    Despite continuous efforts in automating experimental structure determination and systematic target selection in structural genomics projects, the gap between the number of known amino acid sequences and solved 3D structures for proteins is constantly widening. While DNA sequencing technologies are advancing at an extraordinary pace, thereby constantly increasing throughput while at the same time reducing costs, protein structure determination is still labour intensive, time-consuming and expensive. This trend illustrates the essential importance of complementary computational approaches in order to bridge the so-called sequence-structure gap. About half of the protein families lack structural annotation and therefore are not amenable to techniques that infer protein structure from homologs. These protein families can be addressed by de novo structure prediction approaches that in practice are often limited by the immense computational costs required to search the conformational space for the lowest-energy conformation. Improved predictions of contacts between amino acid residues have been demonstrated to sufficiently constrain the overall protein fold and thereby extend the applicability of de novo methods to larger proteins. Residue-residue contact prediction is based on the idea that selection pressure on protein structure and function can lead to compensatory mutations between spatially close residues. This leaves an echo of correlation signatures that can be traced down from the evolutionary record. Despite the success of contact prediction methods, there are several challenges. The most evident limitation lies in the requirement of deep alignments, which excludes the majority of protein families without associated structural information that are the focus for contact guided de novo structure prediction. The heuristics applied by current contact prediction methods pose another challenge, since they omit available coevolutionary information. This work presents two different approaches for addressing the limitations of contact prediction methods. Instead of inferring evolutionary couplings by maximizing the pseudo-likelihood, I maximize the full likelihood of the statistical model for protein sequence families. This approach performed with comparable precision up to minor improvements over the pseudo-likelihood methods for protein families with few homologous sequences. A Bayesian statistical approach has been developed that provides posterior probability estimates for residue-residue contacts and eradicates the use of heuristics. The full information of coevolutionary signatures is exploited by explicitly modelling the distribution of statistical couplings that reflects the nature of residue-residue interactions. Surprisingly, the posterior probabilities do not directly translate into more precise predictions than obtained by pseudo-likelihood methods combined with prior knowledge. However, the Bayesian framework offers a statistically clean and theoretically solid treatment for the contact prediction problem. This flexible and transparent framework provides a convenient starting point for further developments, such as integrating more complex prior knowledge. The model can also easily be extended towards the Derivation of probability estimates for residue-residue distances to enhance the precision of predicted structures
    • …
    corecore