122 research outputs found

    MEGASAT: automated inference of microsatellite genotypes from sequence data

    Get PDF
    MEGASAT is software that enables genotyping of microsatellite loci using next-generation sequencing data. Microsatellites are amplified in large multiplexes, and then sequenced in pooled amplicons. MEGASAT reads sequence files and automatically scores microsatellite genotypes. It uses fuzzy matches to allow for sequencing errors and applies decision rules to account for amplification artefacts, including nontarget amplification products, replication slippage during PCR (amplification stutter) and differential amplification of alleles. An important fea- ture of MEGASAT is the generation of histograms of the length–frequency distributions of amplification products for each locus and each individual. These histograms, analogous to electropherograms traditionally used to score microsatellite genotypes, enable rapid evaluation and editing of automatically scored genotypes. MEGASAT is written in Perl, runs on Windows, Mac OS X and Linux systems, and includes a simple graphical user interface. We demon- strate MEGASAT using data from guppy, Poecilia reticulata. We genotype 1024 guppies at 43 microsatellites per run on an Illumina MiSeq sequencer. We evaluated the accuracy of automatically called genotypes using two methods, based on pedigree and repeat genotyping data, and obtained estimates of mean genotyping error rates of 0.021 and 0.012. In both estimates, three loci accounted for a disproportionate fraction of genotyping errors; conversely, 26 loci were scored with 0–1 detected error (error rate ≤0.007). Our results show that with appropriate selection of loci, automated genotyping of microsatellite loci can be achieved with very high throughput, low genotyping error and very low genotyping costs

    GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied.</p> <p>Results</p> <p>We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote.</p> <p>Conclusions</p> <p>The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.</p

    Phylogenomic Analysis of Marine Roseobacters

    Get PDF
    Background: Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10–25 % of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution. Methodology/Principal Findings: In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira–Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes. Conclusions/Significance: Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes fo

    Phylogenetic Detection of Recombination with a Bayesian Prior on the Distance between Trees

    Get PDF
    Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transition∶transversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination

    Reductive Evolution of the Mitochondrial Processing Peptidases of the Unicellular Parasites Trichomonas vaginalis and Giardia intestinalis

    Get PDF
    Mitochondrial processing peptidases are heterodimeric enzymes (α/βMPP) that play an essential role in mitochondrial biogenesis by recognizing and cleaving the targeting presequences of nuclear-encoded mitochondrial proteins. The two subunits are paralogues that probably evolved by duplication of a gene for a monomeric metallopeptidase from the endosymbiotic ancestor of mitochondria. Here, we characterize the MPP-like proteins from two important human parasites that contain highly reduced versions of mitochondria, the mitosomes of Giardia intestinalis and the hydrogenosomes of Trichomonas vaginalis. Our biochemical characterization of recombinant proteins showed that, contrary to a recent report, the Trichomonas processing peptidase functions efficiently as an α/β heterodimer. By contrast, and so far uniquely among eukaryotes, the Giardia processing peptidase functions as a monomer comprising a single βMPP-like catalytic subunit. The structure and surface charge distribution of the Giardia processing peptidase predicted from a 3-D protein model appear to have co-evolved with the properties of Giardia mitosomal targeting sequences, which, unlike classic mitochondrial targeting signals, are typically short and impoverished in positively charged residues. The majority of hydrogenosomal presequences resemble those of mitosomes, but longer, positively charged mitochondrial-type presequences were also identified, consistent with the retention of the Trichomonas αMPP-like subunit. Our computational and experimental/functional analyses reveal that the divergent processing peptidases of Giardia mitosomes and Trichomonas hydrogenosomes evolved from the same ancestral heterodimeric α/βMPP metallopeptidase as did the classic mitochondrial enzyme. The unique monomeric structure of the Giardia enzyme, and the co-evolving properties of the Giardia enzyme and substrate, provide a compelling example of the power of reductive evolution to shape parasite biology

    PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenies, i.e., the evolutionary histories of groups of taxa, play a major role in representing the interrelationships among biological entities. Many software tools for reconstructing and evaluating such phylogenies have been proposed, almost all of which assume the underlying evolutionary history to be a tree. While trees give a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by trees. Processes such as horizontal gene transfer (HGT), hybrid speciation, and interspecific recombination, collectively referred to as <it>reticulate evolutionary events</it>, result in <it>networks</it>, rather than trees, of relationships. Various software tools have been recently developed to analyze reticulate evolutionary relationships, which include SplitsTree4, LatTrans, EEEP, HorizStory, and T-REX.</p> <p>Results</p> <p>In this paper, we report on the PhyloNet software package, which is a suite of tools for analyzing reticulate evolutionary relationships, or <it>evolutionary networks</it>, which are rooted, directed, acyclic graphs, leaf-labeled by a set of taxa. These tools can be classified into four categories: (1) evolutionary network representation: reading/writing evolutionary networks in a newly devised compact form; (2) evolutionary network characterization: analyzing evolutionary networks in terms of three basic building blocks – trees, clusters, and tripartitions; (3) evolutionary network comparison: comparing two evolutionary networks in terms of topological dissimilarities, as well as fitness to sequence evolution under a maximum parsimony criterion; and (4) evolutionary network reconstruction: reconstructing an evolutionary network from a species tree and a set of gene trees.</p> <p>Conclusion</p> <p>The software package, PhyloNet, offers an array of utilities to allow for efficient and accurate analysis of evolutionary networks. The software package will help significantly in analyzing large data sets, as well as in studying the performance of evolutionary network reconstruction methods. Further, the software package supports the proposed eNewick format for compact representation of evolutionary networks, a feature that allows for efficient interoperability of evolutionary network software tools. Currently, all utilities in PhyloNet are invoked on the command line.</p

    Obscured phylogeny and possible recombinational dormancy in Escherichia coli

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Escherichia coli </it>is one of the best studied organisms in all of biology, but its phylogenetic structure has been difficult to resolve with current data and analytical techniques. We analyzed single nucleotide polymorphisms in chromosomes of representative strains to reconstruct the topology of its emergence.</p> <p>Results</p> <p>The phylogeny of <it>E. coli </it>varies according to the segment of chromosome analyzed. Recombination between extant <it>E. coli </it>groups is largely limited to only three intergroup pairings.</p> <p>Conclusions</p> <p>Segment-dependent phylogenies most likely are legacies of a complex recombination history. However, <it>E. coli </it>are now in an epoch in which they no longer broadly share DNA. Using the definition of species as organisms that freely exchange genetic material, this recombinational dormancy could reflect either the end of <it>E. coli </it>as a species, or herald the coalescence of <it>E. coli </it>groups into new species.</p

    Genome Evolution and the Emergence of Fruiting Body Development in Myxococcus xanthus

    Get PDF
    BACKGROUND: Lateral gene transfer (LGT) is thought to promote speciation in bacteria, though well-defined examples have not been put forward. METHODOLOGY/PRINCIPLE FINDINGS: We examined the evolutionary history of the genes essential for a trait that defines a phylogenetic order, namely fruiting body development of the Myxococcales. Seventy-eight genes that are essential for Myxococcus xanthus development were examined for LGT. About 73% of the genes exhibit a phylogeny similar to that of the 16S rDNA gene and a codon bias consistent with other M. xanthus genes suggesting vertical transmission. About 22% have an altered codon bias and/or phylogeny suggestive of LGT. The remaining 5% are unique. Genes encoding signal production and sensory transduction were more likely to be transmitted vertically with clear examples of duplication and divergence into multigene families. Genes encoding metabolic enzymes were frequently acquired by LGT. Myxobacteria exhibit aerobic respiration unlike most of the delta Proteobacteria. M. xanthus contains a unique electron transport pathway shaped by LGT of genes for succinate dehydrogenase and three cytochrome oxidase complexes. CONCLUSIONS/SIGNIFICANCE: Fruiting body development depends on genes acquired by LGT, particularly those involved in polysaccharide production. We suggest that aerobic growth fostered innovation necessary for development by allowing myxobacteria access to a different gene pool from anaerobic members of the delta Proteobacteria. Habitat destruction and loss of species diversity could restrict the evolution of new bacterial groups by limiting the size of the prospective gene pool

    Statistical Mechanics of Horizontal Gene Transfer in Evolutionary Ecology

    Full text link
    The biological world, especially its majority microbial component, is strongly interacting and may be dominated by collective effects. In this review, we provide a brief introduction for statistical physicists of the way in which living cells communicate genetically through transferred genes, as well as the ways in which they can reorganize their genomes in response to environmental pressure. We discuss how genome evolution can be thought of as related to the physical phenomenon of annealing, and describe the sense in which genomes can be said to exhibit an analogue of information entropy. As a direct application of these ideas, we analyze the variation with ocean depth of transposons in marine microbial genomes, predicting trends that are consistent with recent observations using metagenomic surveys.Comment: Accepted by Journal of Statistical Physic
    corecore