138 research outputs found
Evidence of widespread degradation of gene control regions in hominid genomes
Although sequences containing regulatory elements located close to protein-coding genes are often only weakly conserved during evolution, comparisons of rodent genomes have implied that these sequences are subject to some selective constraints. Evolutionary conservation is particularly apparent upstream of coding sequences and in first introns, regions that are enriched for regulatory elements. By comparing the human and chimpanzee genomes, we show here that there is almost no evidence for conservation in these regions in hominids. Furthermore, we show that gene expression is diverging more rapidly in hominids than in murids per unit of neutral sequence divergence. By combining data on polymorphism levels in human noncoding DNA and the corresponding human¿chimpanzee divergence, we show that the proportion of adaptive substitutions in these regions in hominids is very low. It therefore seems likely that the lack of conservation and increased rate of gene expression divergence are caused by a reduction in the effectiveness of natural selection against deleterious mutations because of the low effective population sizes of hominids. This has resulted in the accumulation of a large number of deleterious mutations in sequences containing gene control elements and hence a widespread degradation of the genome during the evolution of humans and chimpanzees
Effect of the assignment of ancestral CpG state on the estimation of nucleotide substitution rates in mammals
<p>Abstract</p> <p>Background</p> <p>Molecular evolutionary studies in mammals often estimate nucleotide substitution rates within and outside CpG dinucleotides separately. Frequently, in alignments of two sequences, the division of sites into CpG and non-CpG classes is based simply on the presence or absence of a CpG dinucleotide in either sequence, a procedure that we refer to as CpG/non-CpG assignment. Although it likely that this procedure is biased, it is generally assumed that the bias is negligible if species are very closely related.</p> <p>Results</p> <p>Using simulations of DNA sequence evolution we show that assignment of the ancestral CpG state based on the simple presence/absence of the CpG dinucleotide can seriously bias estimates of the substitution rate, because many true non-CpG changes are misassigned as CpG. Paradoxically, this bias is most severe between closely related species, because a minimum of two substitutions are required to misassign a true ancestral CpG site as non-CpG whereas only a single substitution is required to misassign a true ancestral non-CpG site as CpG in a two branch tree. We also show that CpG misassignment bias differentially affects fourfold degenerate and noncoding sites due to differences in base composition such that fourfold degenerate sites can appear to be evolving more slowly than noncoding sites. We demonstrate that the effects predicted by our simulations occur in a real evolutionary setting by comparing substitution rates estimated from human-chimp coding and intronic sequence using CpG/non-CpG assignment with estimates derived from a method that is largely free from bias.</p> <p>Conclusion</p> <p>Our study demonstrates that a common method of assigning sites into CpG and non CpG classes in pairwise alignments is seriously biased and recommends against the adoption of <it>ad hoc </it>methods of ancestral state assignment.</p
Estimate of the Spontaneous Mutation Rate in Chlamydomonas reinhardtii
The nature of spontaneous mutations, including their rate, distribution across the genome, and fitness consequences, is of central importance to biology. However, the low rate of mutation has made it difficult to study spontaneous mutagenesis, and few studies have directly addressed these questions. Here, we present a direct estimate of the mutation rate and a description of the properties of new spontaneous mutations in the unicellular green alga Chlamydomonas reinhardtii. We conducted a mutation accumulation experiment for ∼350 generations followed by whole-genome resequencing of two replicate lines. Our analysis identified a total of 14 mutations, including 5 short indels and 9 single base mutations, and no evidence of larger structural mutations. From this, we estimate a total mutation rate of 3.23 × 10(−10)/site/generation (95% C.I. 1.82 × 10(−10) to 5.23 × 10(−10)) and a single base mutation rate of 2.08 × 10(−10)/site/generation (95% C.I., 1.09 × 10(−10) to 3.74 × 10(−10)). We observed no mutations from A/T → G/C, suggesting a strong mutational bias toward A/T, although paradoxically, the GC content of the C. reinhardtii genome is very high. Our estimate is only the second direct estimate of the mutation rate from plants and among the lowest spontaneous base-substitution rates known in eukaryotes
MCALIGN2: Faster, accurate global pairwise alignment of non-coding DNA sequences based on explicit models of indel evolution
BACKGROUND: Non-coding DNA sequences comprise a very large proportion of the total genomic content of mammals, most other vertebrates, many invertebrates, and most plants. Unraveling the functional significance of non-coding DNA depends on how well we are able to align non-coding DNA sequences. However, the alignment of non-coding DNA sequences is more difficult than aligning protein-coding sequences. RESULTS: Here we present an improved pair-hidden-Markov-Model (pair HMM) based method for performing global pairwise alignment of non-coding DNA sequences. The method uses an explicit model of indel length frequency distribution which can be specified, and allows any time reversible model of nucleotide substitution. The method uses a deterministic global optimiser to find the alignment with the highest posterior probability. We test MCALIGN2 in simulations, and compare it to a previous Monte Carlo based method (MCALIGN), to the pair HMM method of Knudsen and Miyamoto, and to a heuristic method (AVID) that performed very well in a previous simulation study. We show that the pair HMM methods have excellent performance for all combinations of parameter values we have considered. MCALIGN2 is up to ten times faster than MCALIGN. MCALIGN2 is more accurate in resolving indels given an accurate explicit model than heuristic methods, but is computationally slower. CONCLUSION: MCALIGN2 produces better quality alignments by explicitly using biological knowledge about the indel length distribution and time reversible models of nucleotide substitution. As a result, it can outperform other available sequence alignment methods for the cases we have considered to align non-coding DNA sequences
Rates and Fitness Consequences of New Mutations in Humans
ABSTRACT The human mutation rate per nucleotide site per generation (m) can be estimated from data on mutation rates at loci causing Mendelian genetic disease, by comparing putatively neutrally evolving nucleotide sequences between humans and chimpanzees and by comparing the genome sequences of relatives. Direct estimates from genome sequencing of relatives suggest that m is about 1.1 · 10 28 , which is about twofold lower than estimates based on the human-chimp divergence. This implies that an average of 70 new mutations arise in the human diploid genome per generation. Most of these mutations are paternal in origin, but the male:female mutation rate ratio is currently uncertain and might vary even among individuals within a population. On the basis of a method proposed by Kondrashov and Crow, the genome-wide deleterious mutation rate (U) can be estimated from the product of the number of nucleotide sites in the genome, m, and the mean selective constraint per site. Although the presence of many weakly selected mutations in human noncoding DNA makes this approach somewhat problematic, estimates are U 2.2 for the whole diploid genome per generation and 0.35 for mutations that change an amino acid of a protein-coding gene. A genome-wide deleterious mutation rate of 2.2 seems higher than humans could tolerate if natural selection is "hard," but could be tolerated if selection acts on relative fitness differences between individuals or if there is synergistic epistasis. I argue that in the foreseeable future, an accumulation of new deleterious mutations is unlikely to lead to a detectable decline in fitness of human populations. F OR some time, it has been thought that each newborn human has many tens of new mutations that appeared in its mother's or father's germline. This extraordinarily high genomic rate of mutation includes a small fraction of advantageous mutations that have fueled the evolution of our species and that are the basis of ongoing adaptive evolution. The input of new variation brings along with it mutations that cause Mendelian genetic disease, kept at low frequencies by natural selection, and a burden of less harmful mutations that presumably maintain genetic variation in susceptibility to complex diseases. For anthropocentric reasons, we are fundamentally interested in the mutation rate in our own species, and a number of questions are unresolved or still only partially answered. A central parameter is the average number of new genetic variants each of us has that our parents did not possess. How many of these mutations came from our father and our mother, how much does the mutation rate change with parental age, and is there significant variation in the mutation rate in the population as a consequence of other environmental or genetic factors? A more difficult question to answer concerns the frequency of mildly deleterious mutations and the distribution of their fitness effects. Along with the nature of selection on fitness in human populations, these parameters hold the key to understanding how a high genomic rate o
Patterns of selective constraints in noncoding DNA of rice
<p>Abstract</p> <p>Background</p> <p>Several studies have investigated the relationships between selective constraints in introns and their length, GC content and location within genes. To date, however, no such investigation has been done in plants. Studies of selective constraints in noncoding DNA have generally involved interspecific comparisons, under the assumption of the same selective pressures acting in each lineage. Such comparisons are limited to cases in which the noncoding sequences are not too strongly diverged so that reliable sequence alignments can be obtained. Here, we investigate selective constraints in a recent segmental duplication that includes 605 paralogous intron pairs that occurred about 7 million years ago in rice (<it>O. sativa</it>).</p> <p>Results</p> <p>Our principal findings are: (1) intronic divergence is negatively correlated with intron length, a pattern that has previously been described in <it>Drosophila </it>and mammals; (2) there is a signature of strong purifying selection at splice control sites; (3) first introns are significantly longer and have a higher GC content than other introns; (4) the divergences of first and non-first introns are not significantly different from one another, a pattern that differs from <it>Drosophila </it>and mammals; and (5) short introns are more diverged than four-fold degenerate sites suggesting that selection reduces divergence at four-fold sites.</p> <p>Conclusion</p> <p>Our observation of stronger selective constraints in long introns suggests that functional elements subject to purifying selection may be concentrated within long introns. Our results are consistent with the presence of strong purifying selection at splicing control sites. Selective constraints are not significantly stronger in first introns of rice, as they are in other species.</p
- …