10 research outputs found

    Inferring strength of selection in vertebrate genomes

    Get PDF
    Protein-coding sequences have long been assumed to evolve under selection, but the quantification of the process at the nucleotide sequence level only started when a simple null model, the neutral theory of molecular evolution, was formulated by Kimura. Several methods were developed, which were based on the assumption that synonymous sites (nucleotides at third codon positions which do not change the encoded amino acid) evolve close to neutrally, and could be used as local neutral standards. Most of our current knowledge on the direction and strength of selection still depends on this simple assumption. One method, notably the non-synonymous to synonymous substitution rate ratio (dN/dS) has gained prevalence and is still widely used, in spite of the growing body of evidence that synonymous sites evolve under selection. In this thesis, I quantify the strength of selection in different sequence compartments of mammalian genomes, in order to obtain estimates of their functional importance from comparative genomics analyses. I quantify the fraction of mutations that have been selectively eliminated since the divergence of the species pairs examined, the so called genome wide selective constraint. This in turn is used to approximate the genomic deleterious mutation rate, which is an important parameter for several evolutionary problems. As estimates of selection depend on a large extent on the chosen neutral standard, here I use orthologous transposable elements, so called ancestral repeats, as these have been found to be evolving at a largely neutral fashion, and contain the least number of constrained sites in mammalian genomes. This enables me to quantify the level of selection even at synonymous sites, and the results suggest that these sites indeed evolve under constraint, the consequences of which I discuss. The selective constraint estimates enable me to test some simple hypotheses, such as Ohta's nearly neutral theory of molecular evolution, which suggests that selection is more efficient in species with larger effective population sizes. Beside the choice of neutral standards, there are several additional factors which are known to affect the selective constraint estimates. Here I also test the consequences of one of these, notably when sequences are not at compositional equilibrium (i.e. their GC content is away from the equilibrium GC content), which predicts that sequences with different GC content should evolve with different rates. This can cause bias in the estimates of level of selection or can even imitate selection in sequences which evolve completely neutrally. This effect is quantified here, and a simple correction is discussed

    A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck)

    Get PDF
    Background: The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome. Findings: This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families. Conclusions: This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long -read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses

    Avianbase: a community resource for bird genomics

    Get PDF
    Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium

    Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents

    Get PDF
    The contribution of regulatory versus protein change to adaptive evolution has long been controversial. In principle, the rate and strength of adaptation within functional genetic elements can be quantified on the basis of an excess of nucleotide substitutions between species compared to the neutral expectation or from effects of recent substitutions on nucleotide diversity at linked sites. Here, we infer the nature of selective forces acting in proteins, their UTRs and conserved noncoding elements (CNEs) using genome-wide patterns of diversity in wild house mice and divergence to related species. By applying an extension of the McDonald-Kreitman test, we infer that adaptive substitutions are widespread in protein-coding genes, UTRs and CNEs, and we estimate that there are at least four times as many adaptive substitutions in CNEs and UTRs as in proteins. We observe pronounced reductions in mean diversity around nonsynonymous sites (whether or not they have experienced a recent substitution). This can be explained by selection on multiple, linked CNEs and exons. We also observe substantial dips in mean diversity (after controlling for divergence) around protein-coding exons and CNEs, which can also be explained by the combined effects of many linked exons and CNEs. A model of background selection (BGS) can adequately explain the reduction in mean diversity observed around CNEs. However, BGS fails to explain the wide reductions in mean diversity surrounding exons (encompassing ~100 Kb, on average), implying that there is a substantial role for adaptation within exons or closely linked sites. The wide dips in diversity around exons, which are hard to explain by BGS, suggest that the fitness effects of adaptive amino acid substitutions could be substantially larger than substitutions in CNEs. We conclude that although there appear to be many more adaptive noncoding changes, substitutions in proteins may dominate phenotypic evolution

    Inference of Mutation Parameters and Selective Constraint in Mammalian Coding Sequences by Approximate Bayesian Computation

    No full text
    We develop an inference method that uses approximate Bayesian computation (ABC) to simultaneously estimate mutational parameters and selective constraint on the basis of nucleotide divergence for protein-coding genes between pairs of species. Our simulations explicitly model CpG hypermutability and transition vs. transversion mutational biases along with negative and positive selection operating on synonymous and nonsynonymous sites. We evaluate the method by simulations in which true mean parameter values are known and show that it produces reasonably unbiased parameter estimates as long as sequences are not too short and sequence divergence is not too low. We show that the use of quadratic regression within ABC offers an improvement over linear regression, but that weighted regression has little impact on the efficiency of the procedure. We apply the method to estimate mutational and selective constraint parameters in data sets of protein-coding genes extracted from the genome sequences of primates, murids, and carnivores. Estimates of CpG hypermutability are substantially higher in primates than murids and carnivores. Nonsynonymous site selective constraint is substantially higher in murids and carnivores than primates, and autosomal nonsynonymous constraint is higher than X-chromsome constraint in all taxa. We detect significant selective constraint at synonymous sites in primates, carnivores, and murid rodents. Synonymous site selective constraint is weakest in murids, a surprising result, considering that murid effective population sizes are likely to be considerably higher than the other two taxa

    CsrA Inhibits Translation Initiation of Escherichia coli hfq by Binding to a Single Site Overlapping the Shine-Dalgarno Sequenceâ–¿

    Get PDF
    Csr (carbon storage regulation) of Escherichia coli is a global regulatory system that consists of CsrA, a homodimeric RNA binding protein, two noncoding small RNAs (sRNAs; CsrB and CsrC) that function as CsrA antagonists by sequestering this protein, and CsrD, a specificity factor that targets CsrB and CsrC for degradation by RNase E. CsrA inhibits translation initiation of glgC, cstA, and pgaA by binding to their leader transcripts and preventing ribosome binding. Translation inhibition is thought to contribute to the observed mRNA destabilization. Each of the previously known target transcripts contains multiple CsrA binding sites. A position-specific weight matrix search program was developed using known CsrA binding sites in mRNA. This search tool identified a potential CsrA binding site that overlaps the Shine-Dalgarno sequence of hfq, a gene that encodes an RNA chaperone that mediates sRNA-mRNA interactions. This putative CsrA binding site matched the SELEX-derived binding site consensus sequence in 8 out of 12 positions. Results from gel mobility shift and footprint assays demonstrated that CsrA binds specifically to this site in the hfq leader transcript. Toeprint and cell-free translation results indicated that bound CsrA inhibits Hfq synthesis by competitively blocking ribosome binding. Disruption of csrA caused elevated expression of an hfq′-′lacZ translational fusion, while overexpression of csrA inhibited expression of this fusion. We also found that hfq mRNA is stabilized upon entry into stationary-phase growth by a CsrA-independent mechanism. The interaction of CsrA with hfq mRNA is the first example of a CsrA-regulated gene that contains only one CsrA binding site

    Estimates of mean nucleotide diversity (<i>Ï€</i>) in house mice, divergence to rat (<i>d<sub>rat</sub></i>) and their ratio (<i>Ï€</i>/<i>d</i>) plotted against the distance from the nearest protein-coding exon (panel A) or CNE (panel B).

    No full text
    <p>Mean estimates of <i>π</i>/<i>d</i> can be approximated well by a negative exponential function (red line), obtained by fitting the function f(<i>x</i>) = <i>A</i>(1-<i>B</i>(exp(-<i>x/d</i>))) to mean <i>π</i>/<i>d</i> by nonlinear least squares (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003995#s4" target="_blank">Materials and Methods</a> for details). The bottom panel shows the number of sites (in Mb) on a log scale that contribute to each bin.</p

    Results from DFE-alpha.

    No full text
    <p><i>n<sub>t</sub></i> is the total number of sites in the reference genome corresponding to each mutually exclusive site class (including non-canonical spliceforms in the case of protein-coding exons). <i>N<sub>e</sub>s</i> (the product of the mean homozygous effect of a deleterious mutation and the effective population size) and <i>β</i> (the gamma shape parameter, which has a lower estimable value of 0.05 within DFE-alpha) are the inferred parameters of the DFE, from which we calculate the mean fixation probability of a deleterious mutation relative to a neutral mutation (<i>u<sub>n</sub></i>) and estimates of the proportion of deleterious mutations in three ranges of fitness effects (on a scale of <i>N<sub>e</sub>s</i> = 0–1, 1–10 and 10+). From estimates of divergence from rat at selected and neutral sites, we calculate estimates of the proportion of adaptive substitutions (<i>α</i>) and the rate of adaptive substitution relative to the rate of synonymous substitution (<i>ω<sub>a</sub></i>) (results are shown for non-CpG-prone sites). <i>n<sub>a</sub></i> is an estimate of the total number of adaptive substitutions between mouse and rat attributable to each site class and is calculated from <i>n<sub>a</sub></i> = <i>ω<sub>a</sub> n<sub>t</sub> d<sub>s</sub></i>, where <i>d<sub>s</sub></i> = 0.18, an estimate of divergence for synonymous sites. 95% confidence limits are shown in square brackets.</p

    Patterns of nucleotide diversity (<i>Ï€</i>), divergence to rat (<i>d</i>) and <i>Ï€/d</i>, in the flanks of zero-fold degenerate and four-fold degenerate protein-coding sites identified as either having a fixed substitution between <i>M. m. castaneus</i> and <i>M. famulus</i> or no substitution.

    No full text
    <p>Patterns of nucleotide diversity (<i>Ï€</i>), divergence to rat (<i>d</i>) and <i>Ï€/d</i>, in the flanks of zero-fold degenerate and four-fold degenerate protein-coding sites identified as either having a fixed substitution between <i>M. m. castaneus</i> and <i>M. famulus</i> or no substitution.</p
    corecore