10 research outputs found
Inferring strength of selection in vertebrate genomes
Protein-coding sequences have long been assumed to evolve under selection, but the
quantification of the process at the nucleotide sequence level only started when a
simple null model, the neutral theory of molecular evolution, was formulated by
Kimura. Several methods were developed, which were based on the assumption that
synonymous sites (nucleotides at third codon positions which do not change the
encoded amino acid) evolve close to neutrally, and could be used as local neutral
standards. Most of our current knowledge on the direction and strength of selection
still depends on this simple assumption. One method, notably the non-synonymous to
synonymous substitution rate ratio (dN/dS) has gained prevalence and is still widely
used, in spite of the growing body of evidence that synonymous sites evolve under
selection. In this thesis, I quantify the strength of selection in different sequence
compartments of mammalian genomes, in order to obtain estimates of their
functional importance from comparative genomics analyses. I quantify the fraction of
mutations that have been selectively eliminated since the divergence of the species
pairs examined, the so called genome wide selective constraint. This in turn is used
to approximate the genomic deleterious mutation rate, which is an important
parameter for several evolutionary problems. As estimates of selection depend on a
large extent on the chosen neutral standard, here I use orthologous transposable
elements, so called ancestral repeats, as these have been found to be evolving at a
largely neutral fashion, and contain the least number of constrained sites in
mammalian genomes. This enables me to quantify the level of selection even at
synonymous sites, and the results suggest that these sites indeed evolve under
constraint, the consequences of which I discuss. The selective constraint estimates
enable me to test some simple hypotheses, such as Ohta's nearly neutral theory of
molecular evolution, which suggests that selection is more efficient in species with
larger effective population sizes. Beside the choice of neutral standards, there are
several additional factors which are known to affect the selective constraint
estimates. Here I also test the consequences of one of these, notably when sequences are not at compositional equilibrium (i.e. their GC content is away from the
equilibrium GC content), which predicts that sequences with different GC content
should evolve with different rates. This can cause bias in the estimates of level of
selection or can even imitate selection in sequences which evolve completely
neutrally. This effect is quantified here, and a simple correction is discussed
A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck)
Background: The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome. Findings: This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families. Conclusions: This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long -read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses
Avianbase: a community resource for bird genomics
Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium
Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents
The contribution of regulatory versus protein change to adaptive evolution has long been controversial. In principle, the rate and strength of adaptation within functional genetic elements can be quantified on the basis of an excess of nucleotide substitutions between species compared to the neutral expectation or from effects of recent substitutions on nucleotide diversity at linked sites. Here, we infer the nature of selective forces acting in proteins, their UTRs and conserved noncoding elements (CNEs) using genome-wide patterns of diversity in wild house mice and divergence to related species. By applying an extension of the McDonald-Kreitman test, we infer that adaptive substitutions are widespread in protein-coding genes, UTRs and CNEs, and we estimate that there are at least four times as many adaptive substitutions in CNEs and UTRs as in proteins. We observe pronounced reductions in mean diversity around nonsynonymous sites (whether or not they have experienced a recent substitution). This can be explained by selection on multiple, linked CNEs and exons. We also observe substantial dips in mean diversity (after controlling for divergence) around protein-coding exons and CNEs, which can also be explained by the combined effects of many linked exons and CNEs. A model of background selection (BGS) can adequately explain the reduction in mean diversity observed around CNEs. However, BGS fails to explain the wide reductions in mean diversity surrounding exons (encompassing ~100 Kb, on average), implying that there is a substantial role for adaptation within exons or closely linked sites. The wide dips in diversity around exons, which are hard to explain by BGS, suggest that the fitness effects of adaptive amino acid substitutions could be substantially larger than substitutions in CNEs. We conclude that although there appear to be many more adaptive noncoding changes, substitutions in proteins may dominate phenotypic evolution
Inference of Mutation Parameters and Selective Constraint in Mammalian Coding Sequences by Approximate Bayesian Computation
We develop an inference method that uses approximate Bayesian computation (ABC) to simultaneously estimate mutational parameters and selective constraint on the basis of nucleotide divergence for protein-coding genes between pairs of species. Our simulations explicitly model CpG hypermutability and transition vs. transversion mutational biases along with negative and positive selection operating on synonymous and nonsynonymous sites. We evaluate the method by simulations in which true mean parameter values are known and show that it produces reasonably unbiased parameter estimates as long as sequences are not too short and sequence divergence is not too low. We show that the use of quadratic regression within ABC offers an improvement over linear regression, but that weighted regression has little impact on the efficiency of the procedure. We apply the method to estimate mutational and selective constraint parameters in data sets of protein-coding genes extracted from the genome sequences of primates, murids, and carnivores. Estimates of CpG hypermutability are substantially higher in primates than murids and carnivores. Nonsynonymous site selective constraint is substantially higher in murids and carnivores than primates, and autosomal nonsynonymous constraint is higher than X-chromsome constraint in all taxa. We detect significant selective constraint at synonymous sites in primates, carnivores, and murid rodents. Synonymous site selective constraint is weakest in murids, a surprising result, considering that murid effective population sizes are likely to be considerably higher than the other two taxa
CsrA Inhibits Translation Initiation of Escherichia coli hfq by Binding to a Single Site Overlapping the Shine-Dalgarno Sequenceâ–¿
Csr (carbon storage regulation) of Escherichia coli is a global regulatory system that consists of CsrA, a homodimeric RNA binding protein, two noncoding small RNAs (sRNAs; CsrB and CsrC) that function as CsrA antagonists by sequestering this protein, and CsrD, a specificity factor that targets CsrB and CsrC for degradation by RNase E. CsrA inhibits translation initiation of glgC, cstA, and pgaA by binding to their leader transcripts and preventing ribosome binding. Translation inhibition is thought to contribute to the observed mRNA destabilization. Each of the previously known target transcripts contains multiple CsrA binding sites. A position-specific weight matrix search program was developed using known CsrA binding sites in mRNA. This search tool identified a potential CsrA binding site that overlaps the Shine-Dalgarno sequence of hfq, a gene that encodes an RNA chaperone that mediates sRNA-mRNA interactions. This putative CsrA binding site matched the SELEX-derived binding site consensus sequence in 8 out of 12 positions. Results from gel mobility shift and footprint assays demonstrated that CsrA binds specifically to this site in the hfq leader transcript. Toeprint and cell-free translation results indicated that bound CsrA inhibits Hfq synthesis by competitively blocking ribosome binding. Disruption of csrA caused elevated expression of an hfq′-′lacZ translational fusion, while overexpression of csrA inhibited expression of this fusion. We also found that hfq mRNA is stabilized upon entry into stationary-phase growth by a CsrA-independent mechanism. The interaction of CsrA with hfq mRNA is the first example of a CsrA-regulated gene that contains only one CsrA binding site
Estimates of mean nucleotide diversity (<i>Ï€</i>) in house mice, divergence to rat (<i>d<sub>rat</sub></i>) and their ratio (<i>Ï€</i>/<i>d</i>) plotted against the distance from the nearest protein-coding exon (panel A) or CNE (panel B).
<p>Mean estimates of <i>π</i>/<i>d</i> can be approximated well by a negative exponential function (red line), obtained by fitting the function f(<i>x</i>) = <i>A</i>(1-<i>B</i>(exp(-<i>x/d</i>))) to mean <i>π</i>/<i>d</i> by nonlinear least squares (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003995#s4" target="_blank">Materials and Methods</a> for details). The bottom panel shows the number of sites (in Mb) on a log scale that contribute to each bin.</p
Results from DFE-alpha.
<p><i>n<sub>t</sub></i> is the total number of sites in the reference genome corresponding to each mutually exclusive site class (including non-canonical spliceforms in the case of protein-coding exons). <i>N<sub>e</sub>s</i> (the product of the mean homozygous effect of a deleterious mutation and the effective population size) and <i>β</i> (the gamma shape parameter, which has a lower estimable value of 0.05 within DFE-alpha) are the inferred parameters of the DFE, from which we calculate the mean fixation probability of a deleterious mutation relative to a neutral mutation (<i>u<sub>n</sub></i>) and estimates of the proportion of deleterious mutations in three ranges of fitness effects (on a scale of <i>N<sub>e</sub>s</i> = 0–1, 1–10 and 10+). From estimates of divergence from rat at selected and neutral sites, we calculate estimates of the proportion of adaptive substitutions (<i>α</i>) and the rate of adaptive substitution relative to the rate of synonymous substitution (<i>ω<sub>a</sub></i>) (results are shown for non-CpG-prone sites). <i>n<sub>a</sub></i> is an estimate of the total number of adaptive substitutions between mouse and rat attributable to each site class and is calculated from <i>n<sub>a</sub></i> = <i>ω<sub>a</sub> n<sub>t</sub> d<sub>s</sub></i>, where <i>d<sub>s</sub></i> = 0.18, an estimate of divergence for synonymous sites. 95% confidence limits are shown in square brackets.</p
Patterns of nucleotide diversity (<i>Ï€</i>), divergence to rat (<i>d</i>) and <i>Ï€/d</i>, in the flanks of zero-fold degenerate and four-fold degenerate protein-coding sites identified as either having a fixed substitution between <i>M. m. castaneus</i> and <i>M. famulus</i> or no substitution.
<p>Patterns of nucleotide diversity (<i>Ï€</i>), divergence to rat (<i>d</i>) and <i>Ï€/d</i>, in the flanks of zero-fold degenerate and four-fold degenerate protein-coding sites identified as either having a fixed substitution between <i>M. m. castaneus</i> and <i>M. famulus</i> or no substitution.</p