49 research outputs found
SIMMAP: Stochastic character mapping of discrete traits on phylogenies
BACKGROUND: Character mapping on phylogenies has played an important, if not critical role, in our understanding of molecular, morphological, and behavioral evolution. Until very recently we have relied on parsimony to infer character changes. Parsimony has a number of serious limitations that are drawbacks to our understanding. Recent statistical methods have been developed that free us from these limitations enabling us to overcome the problems of parsimony by accommodating uncertainty in evolutionary time, ancestral states, and the phylogeny. RESULTS: SIMMAP has been developed to implement stochastic character mapping that is useful to both molecular evolutionists, systematists, and bioinformaticians. Researchers can address questions about positive selection, patterns of amino acid substitution, character association, and patterns of morphological evolution. CONCLUSION: Stochastic character mapping, as implemented in the SIMMAP software, enables users to address questions that require mapping characters onto phylogenies using a probabilistic approach that does not rely on parsimony. Analyses can be performed using a fully Bayesian approach that is not reliant on considering a single topology, set of substitution model parameters, or reconstruction of ancestral states. Uncertainty in these quantities is accommodated by using MCMC samples from their respective posterior distributions
Motif depletion in bacteriophages infecting hosts with CRISPR systems
BACKGROUND: CRISPR is a microbial immune system likely to be involved in host-parasite coevolution. It functions using target sequences encoded by the bacterial genome, which interfere with invading nucleic acids using a homology-dependent system. The system also requires protospacer associated motifs (PAMs), short motifs close to the target sequence that are required for interference in CRISPR types I and II. Here, we investigate whether PAMs are depleted in phage genomes due to selection pressure to escape recognition. RESULTS: To this end, we analyzed two data sets. Phages infecting all bacterial hosts were analyzed first, followed by a detailed analysis of phages infecting the genus Streptococcus, where PAMs are best understood. We use two different measures of motif underrepresentation that control for codon bias and the frequency of submotifs. We compare phages infecting species with a particular CRISPR type to those infecting species without that type. Since only known PAMs were investigated, the analysis is restricted to CRISPR types I-C and I-E and in Streptococcus to types I-C and II. We found evidence for PAM depletion in Streptococcus phages infecting hosts with CRISPR type I-C, in Vibrio phages infecting hosts with CRISPR type I-E and in Streptococcus thermopilus phages infecting hosts with type II-A, known as CRISPR3. CONCLUSIONS: The observed motif depletion in phages with hosts having CRISPR can be attributed to selection rather than to mutational bias, as mutational bias should affect the phages of all hosts. This observation implies that the CRISPR system has been efficient in the groups discussed here. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-663) contains supplementary material, which is available to authorized users
Probabilistic models for CRISPR spacer content evolution
BACKGROUND: The CRISPR/Cas system is known to act as an adaptive and heritable immune system in Eubacteria and Archaea. Immunity is encoded in an array of spacer sequences. Each spacer can provide specific immunity to invasive elements that carry the same or a similar sequence. Even in closely related strains, spacer content is very dynamic and evolves quickly. Standard models of nucleotide evolution cannot be applied to quantify its rate of change since processes other than single nucleotide changes determine its evolution. METHODS: We present probabilistic models that are specific for spacer content evolution. They account for the different processes of insertion and deletion. Insertions can be constrained to occur on one end only or are allowed to occur throughout the array. One deletion event can affect one spacer or a whole fragment of adjacent spacers. Parameters of the underlying models are estimated for a pair of arrays by maximum likelihood using explicit ancestor enumeration. RESULTS: Simulations show that parameters are well estimated on average under the models presented here. There is a bias in the rate estimation when including fragment deletions. The models also estimate times between pairs of strains. But with increasing time, spacer overlap goes to zero, and thus there is an upper bound on the distance that can be estimated. Spacer content similarities are displayed in a distance based phylogeny using the estimated times. We use the presented models to analyze different Yersinia pestis data sets and find that the results among them are largely congruent. The models also capture the variation in diversity of spacers among the data sets. A comparison of spacer-based phylogenies and Cas gene phylogenies shows that they resolve very different time scales for this data set. CONCLUSIONS: The simulations and data analyses show that the presented models are useful for quantifying spacer content evolution and for displaying spacer content similarities of closely related strains in a phylogeny. This allows for comparisons of different CRISPR arrays or for comparisons between CRISPR arrays and nucleotide substitution rates
Complete genome sequence of the novel phage MG-B1 infecting bacillus weihenstephanensis
Here, we describe a novel virulent bacteriophage that infects Bacillus weihenstephanensis, isolated from soil in Austria. It is the first phage to be discovered that infects this species. Here, we present the complete genome sequence of this podovirus
Epistatic Interactions in the Arabinose Cis-Regulatory Element
Changes in gene expression are an important mode of evolution; however, the proximate mechanism of these changes is poorly understood. In particular, little is known about the effects of mutations within cis binding sites for transcription factors, or the nature of epistatic interactions between these mutations. Here, we tested the effects of single and double mutants in two cis binding sites involved in the transcriptional regulation of the Escherichia coli araBAD operon, a component of arabinose metabolism, using a synthetic system. This system decouples transcriptional control from any posttranslational effects on fitness, allowing a precise estimate of the effect of single and double mutations, and hence epistasis, on gene expression. We found that epistatic interactions between mutations in the araBAD cis-regulatory element are common, and that the predominant form of epistasis is negative. The magnitude of the interactions depended on whether the mutations are located in the same or in different operator sites. Importantly, these epistatic interactions were dependent on the presence of arabinose, a native inducer of the araBAD operon in vivo, with some interactions changing in sign (e.g., from negative to positive) in its presence. This study thus reveals that mutations in even relatively simple cis-regulatory elements interact in complex ways such that selection on the level of gene expression in one environment might perturb regulation in the other environment in an unpredictable and uncorrelated manner
The Role of the Environment in Horizontal Gene Transfer
Gene-by-environment interactions play a crucial role in horizontal gene transfer by affecting how the transferred genes alter host fitness. However, how the environment modulates the fitness effect of transferred genes has not been tested systematically in an experimental study. We adapted a high-throughput technique for obtaining very precise estimates of bacterial fitness, in order to measure the fitness effects of 44 orthologs transferred from Salmonella Typhimurium to Escherichia coli in six physiologically relevant environments. We found that the fitness effects of individual genes were highly dependent on the environment, while the distributions of fitness effects across genes were not, with all tested environments resulting in distributions of same shape and spread. Furthermore, the extent to which the fitness effects of a gene varied between environments depended on the average fitness effect of that gene across all environments, with nearly neutral and nearly lethal genes having more consistent fitness effects across all environments compared to deleterious genes. Put together, our results reveal the unpredictable nature of how environmental conditions impact the fitness effects of each individual gene. At the same time, distributions of fitness effects across environments exhibit consistent features, pointing to the generalizability of factors that shape horizontal gene transfer of orthologous genes
Transcriptomic profiling of Escherichia coli K-12 in response to a compendium of stressors
Environmental perturbations impact multiple cellular traits, including gene expression. Bacteria respond to these stressful situations through complex gene interaction networks, thereby inducing stress tolerance and survival of cells. In this paper, we study the response mechanisms of E. coli when exposed to different environmental stressors via differential expression and co-expression analysis. Gene co-expression networks were generated and analyzed via Weighted Gene Co-expression Network Analysis (WGCNA). Based on the gene co-expression networks, genes with similar expression profiles were clustered into modules. The modules were analysed for identification of hub genes, enrichment of biological processes and transcription factors. In addition, we also studied the link between transcription factors and their differentially regulated targets to understand the regulatory mechanisms involved. These networks validate known gene interactions and provide new insights into genes mediating transcriptional regulation in specific stress environments, thus allowing for in silico hypothesis generation
Evolutionary interactions between haemagglutinin and neuraminidase in avian influenza
Background: Reassortment between the RNA segments encoding haemagglutinin (HA) and neuraminidase (NA), the major antigenic influenza proteins, produces viruses with novel HA and NA subtype combinations and has preceded the emergence of pandemic strains. It has been suggested that productive viral infection requires a balance in the level of functional activity of HA and NA, arising from their closely interacting roles in the viral life cycle, and that this functional balance could be mediated by genetic changes in the HA and NA. Here, we investigate how the selective pressure varies for H7 avian influenza HA on different NA subtype backgrounds. Results: By extending Bayesian stochastic mutational mapping methods to calculate the ratio of the rate of non-synonymous change to the rate of synonymous change (d N/d S), we found the average d N/d S across the avian influenza H7 HA1 region to be significantly greater on an N2 NA subtype background than on an N1, N3 or N7 background. Observed differences in evolutionary rates of H7 HA on different NA subtype backgrounds could not be attributed to underlying differences between avian host species or virus pathogenicity. Examination of d N/d S values for each subtype on a site-by-site basis indicated that the elevated d N/d S on the N2 NA background was a result of increased selection, rather than a relaxation of selective constraint. Conclusions: Our results are consistent with the hypothesis that reassortment exposes influenza HA to significant changes in selective pressure through genetic interactions with NA. Such epistatic effects might be explicitly accounted for in future models of influenza evolution