49 research outputs found

    SNPs Occur in Regions with Less Genomic Sequence Conservation

    Get PDF
    Rates of SNPs (single nucleotide polymorphisms) and cross-species genomic sequence conservation reflect intra- and inter-species variation, respectively. Here, I report SNP rates and genomic sequence conservation adjacent to mRNA processing regions and show that, as expected, more SNPs occur in less conserved regions and that functional regions have fewer SNPs. Results are confirmed using both mouse and human data. Regions include protein start codons, 3′ splice sites, 5′ splice sites, protein stop codons, predicted miRNA binding sites, and polyadenylation sites. Throughout, SNP rates are lower and conservation is higher at regulatory sites. Within coding regions, SNP rates are highest and conservation is lowest at codon position three and the fewest SNPs are found at codon position two, reflecting codon degeneracy for amino acid encoding. Exon splice sites show high conservation and very low SNP rates, reflecting both splicing signals and protein coding. Relaxed constraint on the codon third position is dramatically seen when separating exonic SNP rates based on intron phase. At polyadenylation sites, a peak of conservation and low SNP rate occurs from 30 to 17 nt preceding the site. This region is highly enriched for the sequence AAUAAA, reflecting the location of the conserved polyA signal. miRNA 3′ UTR target sites are predicted incorporating interspecies genomic sequence conservation; SNP rates are low in these sites, again showing fewer SNPs in conserved regions. Together, these results confirm that SNPs, reflecting recent genetic variation, occur more frequently in regions with less evolutionarily conservation

    Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

    Get PDF
    One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

    Differential Trends in the Codon Usage Patterns in HIV-1 Genes

    Get PDF
    Host-pathogen interactions underlie one of the most complex evolutionary phenomena resulting in continual adaptive genetic changes, where pathogens exploit the host's molecular resources for growth and survival, while hosts try to eliminate the pathogen. Deciphering the molecular basis of host–pathogen interactions is useful in understanding the factors governing pathogen evolution and disease propagation. In host-pathogen context, a balance between mutation, selection, and genetic drift is known to maintain codon bias in both organisms. Studies revealing determinants of the bias and its dynamics are central to the understanding of host-pathogen evolution. We considered the Human Immunodeficiency Virus (HIV) type 1 and its human host to search for evolutionary signatures in the viral genome. Positive selection is known to dominate intra-host evolution of HIV-1, whereas high genetic variability underlies the belief that neutral processes drive inter-host differences. In this study, we analyze the codon usage patterns of HIV-1 genomes across all subtypes and clades sequenced over a period of 23 years. We show presence of unique temporal correlations in the codon bias of three HIV-1 genes illustrating differential adaptation of the HIV-1 genes towards the host preferred codons. Our results point towards gene-specific translational selection to be an important force driving the evolution of HIV-1 at the population level

    A Universal Trend of Reduced mRNA Stability near the Translation-Initiation Site in Prokaryotes and Eukaryotes

    Get PDF
    Recent studies have suggested that the thermodynamic stability of mRNA secondary structure near the start codon can regulate translation efficiency in Escherichia coli, and that translation is more efficient the less stable the secondary structure. We survey the complete genomes of 340 species for signals of reduced mRNA secondary structure near the start codon. Our analysis includes bacteria, archaea, fungi, plants, insects, fishes, birds, and mammals. We find that nearly all species show evidence for reduced mRNA stability near the start codon. The reduction in stability generally increases with increasing genomic GC content. In prokaryotes, the reduction also increases with decreasing optimal growth temperature. Within genomes, there is variation in the stability among genes, and this variation correlates with gene GC content, codon bias, and gene expression level. For birds and mammals, however, we do not find a genome-wide trend of reduced mRNA stability near the start codon. Yet the most GC rich genes in these organisms do show such a signal. We conclude that reduced stability of the mRNA secondary structure near the start codon is a universal feature of all cellular life. We suggest that the origin of this reduction is selection for efficient recognition of the start codon by initiator-tRNA

    Depletion of somatic mutations in splicing-associated sequences in cancer genomes

    Get PDF
    Abstract Background An important goal of cancer genomics is to identify systematically cancer-causing mutations. A common approach is to identify sites with high ratios of non-synonymous to synonymous mutations; however, if synonymous mutations are under purifying selection, this methodology leads to identification of false-positive mutations. Here, using synonymous somatic mutations (SSMs) identified in over 4000 tumours across 15 different cancer types, we sought to test this assumption by focusing on coding regions required for splicing. Results Exon flanks, which are enriched for sequences required for splicing fidelity, have ~ 17% lower SSM density compared to exonic cores, even after excluding canonical splice sites. While it is impossible to eliminate a mutation bias of unknown cause, multiple lines of evidence support a purifying selection model above a mutational bias explanation. The flank/core difference is not explained by skewed nucleotide content, replication timing, nucleosome occupancy or deficiency in mismatch repair. The depletion is not seen in tumour suppressors, consistent with their role in positive tumour selection, but is otherwise observed in cancer-associated and non-cancer genes, both essential and non-essential. Consistent with a role in splicing modulation, exonic splice enhancers have a lower SSM density before and after controlling for nucleotide composition; moreover, flanks at the 5’ end of the exons have significantly lower SSM density than at the 3’ end. Conclusions These results suggest that the observable mutational spectrum of cancer genomes is not simply a product of various mutational processes and positive selection, but might also be shaped by negative selection

    p53 mutations in classic and pleomorphic invasive lobular carcinoma of the breast

    Get PDF
    Contains fulltext : 110338.pdf (publisher's version ) (Open Access)BACKGROUND: p53 is a tumor suppressor that is frequently mutated in human cancers. Although alterations in p53 are common in breast cancer, few studies have specifically investigated TP53 mutations in the breast cancer subtype invasive lobular carcinoma (ILC). Recently reported conditional mouse models have indicated that functional p53 inactivation may play a role in ILC development and progression. Since reports on the detection of TP53 mutations in the relatively favorable classic and more aggressive pleomorphic variants of ILC (PILC) are rare and ambiguous, we performed a comprehensive analysis to determine the mutation status of TP53 in these breast cancer subtypes. METHODS: To increase our understanding of p53-mediated pathways and the roles they may play in the etiology of classic ILC and PILC, we investigated TP53 mutations and p53 accumulation in a cohort of 22 cases of classic and 19 cases of PILC by direct DNA sequencing and immunohistochemistry. RESULTS: We observed 11 potentially pathogenic TP53 mutations, of which three were detected in classic ILC (13.6%) and 8 in PILC (42.1%; p = 0.04). While p53 protein accumulation was not significantly different between classic and pleomorphic ILC, mutations that affected structure and protein function were significantly associated with p53 protein levels. CONCLUSION: TP53 mutations occur more frequently in PILC than classic ILC.1 april 201

    Mutation analysis of the MDM4 gene in German breast cancer patients

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MDM4 is a negative regulator of p53 and cooperates with MDM2 in the cellular response to DNA damage. It is unknown, however, whether <it>MDM4 </it>gene alterations play some role in the inherited component of breast cancer susceptibility.</p> <p>Methods</p> <p>We sequenced the whole <it>MDM4 </it>coding region and flanking untranslated regions in genomic DNA samples obtained from 40 German patients with familial breast cancer. Selected variants were subsequently screened by RFLP-based assays in an extended set of breast cancer cases and controls.</p> <p>Results</p> <p>Our resequencing study uncovered two <it>MDM4 </it>coding variants in 4/40 patients. Three patients carried a silent substitution at codon 74 that was linked with another rare variant in the 5'UTR. No association of this allele with breast cancer was found in a subsequent screening of 133 patients with bilateral breast cancer and 136 controls. The fourth patient was heterozygous for the missense substitution D153G which is located in a less conserved region of the MDM4 protein but may affect a predicted phosphorylation site. The D153G substitution only partially segregated with breast cancer in the family and was not identified on additional 680 chromosomes screened.</p> <p>Conclusion</p> <p>This study did not reveal clearly pathogenic mutations although it uncovered two new unclassified variants at a low frequency. We conclude that there is no evidence for a major role of <it>MDM4 </it>coding variants in the inherited susceptibility towards breast cancer in German patients.</p

    Late Replicating Domains Are Highly Recombining in Females but Have Low Male Recombination Rates: Implications for Isochore Evolution

    Get PDF
    In mammals sequences that are either late replicating or highly recombining have high rates of evolution at putatively neutral sites. As early replicating domains and highly recombining domains both tend to be GC rich we a priori expect these two variables to covary. If so, the relative contribution of either of these variables to the local neutral substitution rate might have been wrongly estimated owing to covariance with the other. Against our expectations, we find that sex-averaged recombination rates show little or no correlation with replication timing, suggesting that they are independent determinants of substitution rates. However, this result masks significant sex-specific complexity: late replicating domains tend to have high recombination rates in females but low recombination rates in males. That these trends are antagonistic explains why sex-averaged recombination is not correlated with replication timing. This unexpected result has several important implications. First, although both male and female recombination rates covary significantly with intronic substitution rates, the magnitude of this correlation is moderately underestimated for male recombination and slightly overestimated for female recombination, owing to covariance with replicating timing. Second, the result could explain why male recombination is strongly correlated with GC content but female recombination is not. If to explain the correlation between GC content and replication timing we suppose that late replication forces reduced GC content, then GC promotion by biased gene conversion during female recombination is partly countered by the antagonistic effect of later replicating sequence tending increase AT content. Indeed, the strength of the correlation between female recombination rate and local GC content is more than doubled by control for replication timing. Our results underpin the need to consider sex-specific recombination rates and potential covariates in analysis of GC content and rates of evolution

    The surprising negative correlation of gene length and optimal codon use - disentangling translational selection from GC-biased gene conversion in yeast

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Surprisingly, in several multi-cellular eukaryotes optimal codon use correlates negatively with gene length. This contrasts with the expectation under selection for translational accuracy. While suggested explanations focus on variation in strength and efficiency of translational selection, it has rarely been noticed that the negative correlation is reported only in organisms whose optimal codons are biased towards codons that end with G or C (-GC). This raises the question whether forces that affect base composition - such as GC-biased gene conversion - contribute to the negative correlation between optimal codon use and gene length.</p> <p>Results</p> <p>Yeast is a good organism to study this as equal numbers of optimal codons end in -GC and -AT and one may hence compare frequencies of optimal GC- with optimal AT-ending codons to disentangle the forces. Results of this study demonstrate in yeast frequencies of GC-ending (optimal AND non-optimal) codons decrease with gene length and increase with recombination. A decrease of GC-ending codons along genes contributes to the negative correlation with gene length. Correlations with recombination and gene expression differentiate between GC-ending and optimal codons, and also substitution patterns support effects of GC-biased gene conversion.</p> <p>Conclusion</p> <p>While the general effect of GC-biased gene conversion is well known, the negative correlation of optimal codon use with gene length has not been considered in this context before. Initiation of gene conversion events in promoter regions and the presence of a gene conversion gradient most likely explain the observed decrease of GC-ending codons with gene length and gene position.</p
    corecore