Search CORE

10 research outputs found

Inferring strength of selection in vertebrate genomes

Author: Eöry Lél
Publication venue: The University of Edinburgh
Publication date: 27/06/2011
Field of study

Protein-coding sequences have long been assumed to evolve under selection, but the quantification of the process at the nucleotide sequence level only started when a simple null model, the neutral theory of molecular evolution, was formulated by Kimura. Several methods were developed, which were based on the assumption that synonymous sites (nucleotides at third codon positions which do not change the encoded amino acid) evolve close to neutrally, and could be used as local neutral standards. Most of our current knowledge on the direction and strength of selection still depends on this simple assumption. One method, notably the non-synonymous to synonymous substitution rate ratio (dN/dS) has gained prevalence and is still widely used, in spite of the growing body of evidence that synonymous sites evolve under selection. In this thesis, I quantify the strength of selection in different sequence compartments of mammalian genomes, in order to obtain estimates of their functional importance from comparative genomics analyses. I quantify the fraction of mutations that have been selectively eliminated since the divergence of the species pairs examined, the so called genome wide selective constraint. This in turn is used to approximate the genomic deleterious mutation rate, which is an important parameter for several evolutionary problems. As estimates of selection depend on a large extent on the chosen neutral standard, here I use orthologous transposable elements, so called ancestral repeats, as these have been found to be evolving at a largely neutral fashion, and contain the least number of constrained sites in mammalian genomes. This enables me to quantify the level of selection even at synonymous sites, and the results suggest that these sites indeed evolve under constraint, the consequences of which I discuss. The selective constraint estimates enable me to test some simple hypotheses, such as Ohta's nearly neutral theory of molecular evolution, which suggests that selection is more efficient in species with larger effective population sizes. Beside the choice of neutral standards, there are several additional factors which are known to affect the selective constraint estimates. Here I also test the consequences of one of these, notably when sequences are not at compositional equilibrium (i.e. their GC content is away from the equilibrium GC content), which predicts that sequences with different GC content should evolve with different rates. This can cause bias in the estimates of level of selection or can even imitate selection in sequences which evolve completely neutrally. This effect is quantified here, and a simple correction is discussed

Edinburgh Research Archive

Correction to:Avian Immunome DB: an example of a user‑friendly interface for extracting genetic information

Author: Eöry Lél
Kraus Robert H S
Kuo Richard I
Mallig Nicolai
Mueller Ralf C
Smith Jacqueline
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/09/2021
Field of study

PubMed Central

Edinburgh Research Explorer

A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck)

Author: Chow William
Ellström Patrik
Eöry Lél
Fedrigo Olivier
Haase Bettina
Howe Kerstin
Jarvis Erich D
Järhult Josef D
Kraus Robert H S
Kuo Richard I
Miedzinska Katarzyna
Mountcastle Jacquelyn
Mueller Ralf C
Naguib Mahmoud M
Olsen Björn
Smith Jacqueline
Torrance James
Uliano-Silva Marcela
Warr Amanda
Wood Jonathan M D
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

Background: The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome. Findings: This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families. Conclusions: This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long -read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses

Publikationer från Uppsala Universitet

PubMed Central

Edinburgh Research Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Avianbase: a community resource for bird genomics

Author: Alan Archibald
Bo Li
Bronwen L Aken
C-J Rubin
Cai Li
D Karolchik
David W Burt
E Birney
ED Jarvis
Erich Jarvis
G Benson
G Zhang
Genome 10 K Community of Scientists
Guojie Zhang
J Severin
K Lindblad-Toh
Lél Eöry
M Thomas P Gilbert
P Flicek
Paul Flicek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium

Crossref

Springer - Publisher Connector

DukeSpace (Duke Univ.)

Copenhagen University Research Information System

PubMed Central

Edinburgh Research Explorer

UQ eSpace (University of Queensland)

Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents

Author: A Cox
A Doherty
A Eyre-Walker
A Eyre-Walker
A Kousathanas
A Kousathanas
A Siepel
AR Boyko
Athanasios Kousathanas
B Charlesworth
Bettina Harr
BM Peter
Bret A. Payseur
CB Lowe
D Graur
Daniel L. Halligan
David J. Adams
DG Torgerson
DL Halligan
DL Halligan
FC Jones
G Lunter
G McVicker
GA Wray
H Li
H Li
HE Hoekstra
JF Baines
JH McDonald
JJ Cai
JK Pritchard
JV Chamary
K Lindblad-Toh
L Eőry
LD Ward
Lél Eöry
M Carnerio
M Kimura
M Nordborg
M Phifer-Rixey
M Przeworski
M-C King
MI Jensen-Seaman
P Andolfatto
P Andolfatto
PD Keightley
PD Keightley
PD Keightley
PD Keightley
Peter D. Keightley
PW Messer
RD Hernandez
Rob W. Ness
S Sattath
SB Carroll
T Salcedo
THE Wiehe
Thomas M. Keane
WF Doolittle
Y Shen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

The contribution of regulatory versus protein change to adaptive evolution has long been controversial. In principle, the rate and strength of adaptation within functional genetic elements can be quantified on the basis of an excess of nucleotide substitutions between species compared to the neutral expectation or from effects of recent substitutions on nucleotide diversity at linked sites. Here, we infer the nature of selective forces acting in proteins, their UTRs and conserved noncoding elements (CNEs) using genome-wide patterns of diversity in wild house mice and divergence to related species. By applying an extension of the McDonald-Kreitman test, we infer that adaptive substitutions are widespread in protein-coding genes, UTRs and CNEs, and we estimate that there are at least four times as many adaptive substitutions in CNEs and UTRs as in proteins. We observe pronounced reductions in mean diversity around nonsynonymous sites (whether or not they have experienced a recent substitution). This can be explained by selection on multiple, linked CNEs and exons. We also observe substantial dips in mean diversity (after controlling for divergence) around protein-coding exons and CNEs, which can also be explained by the combined effects of many linked exons and CNEs. A model of background selection (BGS) can adequately explain the reduction in mean diversity observed around CNEs. However, BGS fails to explain the wide reductions in mean diversity surrounding exons (encompassing ~100 Kb, on average), implying that there is a substantial role for adaptation within exons or closely linked sites. The wide dips in diversity around exons, which are hard to explain by BGS, suggest that the fitness effects of adaptive amino acid substitutions could be substantially larger than substitutions in CNEs. We conclude that although there appear to be many more adaptive noncoding changes, substitutions in proteins may dominate phenotypic evolution

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

MPG.PuRe

The Francis Crick Institute

Inference of Mutation Parameters and Selective Constraint in Mammalian Coding Sequences by Approximate Bayesian Computation

Author: Eöry Lél
Halligan Daniel L.
Keightley Peter D.
Kirkpatrick Mark
Publication venue: Genetics Society of America
Publication date
Field of study

We develop an inference method that uses approximate Bayesian computation (ABC) to simultaneously estimate mutational parameters and selective constraint on the basis of nucleotide divergence for protein-coding genes between pairs of species. Our simulations explicitly model CpG hypermutability and transition vs. transversion mutational biases along with negative and positive selection operating on synonymous and nonsynonymous sites. We evaluate the method by simulations in which true mean parameter values are known and show that it produces reasonably unbiased parameter estimates as long as sequences are not too short and sequence divergence is not too low. We show that the use of quadratic regression within ABC offers an improvement over linear regression, but that weighted regression has little impact on the efficiency of the procedure. We apply the method to estimate mutational and selective constraint parameters in data sets of protein-coding genes extracted from the genome sequences of primates, murids, and carnivores. Estimates of CpG hypermutability are substantially higher in primates than murids and carnivores. Nonsynonymous site selective constraint is substantially higher in murids and carnivores than primates, and autosomal nonsynonymous constraint is higher than X-chromsome constraint in all taxa. We detect significant selective constraint at synonymous sites in primates, carnivores, and murid rodents. Synonymous site selective constraint is weakest in murids, a surprising result, considering that murid effective population sizes are likely to be considerably higher than the other two taxa

Crossref

PubMed Central

CsrA Inhibits Translation Initiation of Escherichia coli hfq by Binding to a Single Site Overlapping the Shine-Dalgarno Sequence▿

Author: Carol S. Baker
Carol S. Baker
Helen Yakhnin
Helen Yakhnin
Jeffrey Mercante
Lél A. Eöry
Lél A. Eöry
Paul Babitzke
Paul Babitzke
Tony Romeo
Tony Romeo
Publication venue: American Society for Microbiology
Publication date: 01/01/2007
Field of study

Csr (carbon storage regulation) of Escherichia coli is a global regulatory system that consists of CsrA, a homodimeric RNA binding protein, two noncoding small RNAs (sRNAs; CsrB and CsrC) that function as CsrA antagonists by sequestering this protein, and CsrD, a specificity factor that targets CsrB and CsrC for degradation by RNase E. CsrA inhibits translation initiation of glgC, cstA, and pgaA by binding to their leader transcripts and preventing ribosome binding. Translation inhibition is thought to contribute to the observed mRNA destabilization. Each of the previously known target transcripts contains multiple CsrA binding sites. A position-specific weight matrix search program was developed using known CsrA binding sites in mRNA. This search tool identified a potential CsrA binding site that overlaps the Shine-Dalgarno sequence of hfq, a gene that encodes an RNA chaperone that mediates sRNA-mRNA interactions. This putative CsrA binding site matched the SELEX-derived binding site consensus sequence in 8 out of 12 positions. Results from gel mobility shift and footprint assays demonstrated that CsrA binds specifically to this site in the hfq leader transcript. Toeprint and cell-free translation results indicated that bound CsrA inhibits Hfq synthesis by competitively blocking ribosome binding. Disruption of csrA caused elevated expression of an hfq′-′lacZ translational fusion, while overexpression of csrA inhibited expression of this fusion. We also found that hfq mRNA is stabilized upon entry into stationary-phase growth by a CsrA-independent mechanism. The interaction of CsrA with hfq mRNA is the first example of a CsrA-regulated gene that contains only one CsrA binding site

CiteSeerX

Crossref

PubMed Central

Edinburgh Research Explorer

Estimates of mean nucleotide diversity (π) in house mice, divergence to rat (drat) and their ratio (π/d) plotted against the distance from the nearest protein-coding exon (panel A) or CNE (panel B).

Author: Athanasios Kousathanas (493642)
Bettina Harr (13733)
Daniel L. Halligan (254200)
David J. Adams (119198)
Lél Eöry (493644)
Peter D. Keightley (254208)
Rob W. Ness (493643)
Thomas M. Keane (123618)
Publication venue
Publication date
Field of study

Mean estimates of π/d can be approximated well by a negative exponential function (red line), obtained by fitting the function f(x) = A(1-B(exp(-x/d))) to mean π/d by nonlinear least squares (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003995#s4" target="_blank">Materials and Methods</a> for details). The bottom panel shows the number of sites (in Mb) on a log scale that contribute to each bin.</p

The Francis Crick Institute

Results from DFE-alpha.

Author: Athanasios Kousathanas (493642)
Bettina Harr (13733)
Daniel L. Halligan (254200)
David J. Adams (119198)
Lél Eöry (493644)
Peter D. Keightley (254208)
Rob W. Ness (493643)
Thomas M. Keane (123618)
Publication venue
Publication date
Field of study

nt is the total number of sites in the reference genome corresponding to each mutually exclusive site class (including non-canonical spliceforms in the case of protein-coding exons). Nes (the product of the mean homozygous effect of a deleterious mutation and the effective population size) and β (the gamma shape parameter, which has a lower estimable value of 0.05 within DFE-alpha) are the inferred parameters of the DFE, from which we calculate the mean fixation probability of a deleterious mutation relative to a neutral mutation (un) and estimates of the proportion of deleterious mutations in three ranges of fitness effects (on a scale of Nes = 0–1, 1–10 and 10+). From estimates of divergence from rat at selected and neutral sites, we calculate estimates of the proportion of adaptive substitutions (α) and the rate of adaptive substitution relative to the rate of synonymous substitution (ωa) (results are shown for non-CpG-prone sites). na is an estimate of the total number of adaptive substitutions between mouse and rat attributable to each site class and is calculated from na = ωa nt ds, where ds = 0.18, an estimate of divergence for synonymous sites. 95% confidence limits are shown in square brackets.</p

The Francis Crick Institute

Patterns of nucleotide diversity (π), divergence to rat (d) and π/d, in the flanks of zero-fold degenerate and four-fold degenerate protein-coding sites identified as either having a fixed substitution between M. m. castaneus and M. famulus or no substitution.

Author: Athanasios Kousathanas (493642)
Bettina Harr (13733)
Daniel L. Halligan (254200)
David J. Adams (119198)
Lél Eöry (493644)
Peter D. Keightley (254208)
Rob W. Ness (493643)
Thomas M. Keane (123618)
Publication venue
Publication date
Field of study

Patterns of nucleotide diversity (π), divergence to rat (d) and π/d, in the flanks of zero-fold degenerate and four-fold degenerate protein-coding sites identified as either having a fixed substitution between M. m. castaneus and M. famulus or no substitution.</p

The Francis Crick Institute

Inferring strength of selection in vertebrate genomes

Correction to:Avian Immunome DB: an example of a user‑friendly interface for extracting genetic information

A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck)

Avianbase: a community resource for bird genomics

Contributions of protein-coding and regulatory change to adaptive molecular evolution in murid rodents

Inference of Mutation Parameters and Selective Constraint in Mammalian Coding Sequences by Approximate Bayesian Computation

CsrA Inhibits Translation Initiation of Escherichia coli hfq by Binding to a Single Site Overlapping the Shine-Dalgarno Sequence▿

Estimates of mean nucleotide diversity (<i>π</i>) in house mice, divergence to rat (<i>d<sub>rat</sub></i>) and their ratio (<i>π</i>/<i>d</i>) plotted against the distance from the nearest protein-coding exon (panel A) or CNE (panel B).

Results from DFE-alpha.

Patterns of nucleotide diversity (<i>π</i>), divergence to rat (<i>d</i>) and <i>π/d</i>, in the flanks of zero-fold degenerate and four-fold degenerate protein-coding sites identified as either having a fixed substitution between <i>M. m. castaneus</i> and <i>M. famulus</i> or no substitution.