5 research outputs found

    A Simple Method for Analyzing Exome Sequencing Data Shows Distinct Levels of Nonsynonymous Variation for Human Immune and Nervous System Genes

    Get PDF
    To measure the strength of natural selection that acts upon single nucleotide variants (SNVs) in a set of human genes, we calculate the ratio between nonsynonymous SNVs (nsSNVs) per nonsynonymous site and synonymous SNVs (sSNVs) per synonymous site. We transform this ratio with a respective factor f that corrects for the bias of synonymous sites towards transitions in the genetic code and different mutation rates for transitions and transversions. This method approximates the relative density of nsSNVs (rdnsv) in comparison with the neutral expectation as inferred from the density of sSNVs. Using SNVs from a diploid genome and 200 exomes, we apply our method to immune system genes (ISGs), nervous system genes (NSGs), randomly sampled genes (RSGs), and gene ontology annotated genes. The estimate of rdnsv in an individual exome is around 20% for NSGs and 30–40% for ISGs and RSGs. This smaller rdnsv of NSGs indicates overall stronger purifying selection. To quantify the relative shift of nsSNVs towards rare variants, we next fit a linear regression model to the estimates of rdnsv over different SNV allele frequency bins. The obtained regression models show a negative slope for NSGs, ISGs and RSGs, supporting an influence of purifying selection on the frequency spectrum of segregating nsSNVs. The y-intercept of the model predicts rdnsv for an allele frequency close to 0. This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio. A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection. This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system

    Locus category based analysis of a large genome-wide association study of rheumatoid arthritis

    No full text
    To pinpoint true positive single-nucleotide polymorphism (SNP) associations in a genome-wide association study (GWAS) of rheumatoid arthritis (RA), we categorize genetic loci by external knowledge. We test both the ‘enrichment of associated loci’ in a locus category and the ‘combined association’ of a locus category. The former is quantified by the odds ratio for the presence of SNP associations at the loci of a category, whereas the latter is quantified by the number of loci in a category that have SNP associations. These measures are compared with their expected values as obtained from the permutation of the affection status. To account for linkage disequilibrium (LD) among SNPs, we view each LD block as a genetic locus. Positional candidates were defined as loci implicated by earlier GWAS results, whereas functional candidates were defined by annotations regarding the molecular roles of genes, such as gene ontology categories. As expected, immune-related categories show the largest enrichment signal, although it is not very strong. The intersection of positional and functional candidate information predicts novel RA loci near the genes TEC/TXK, MBL2 and PIK3R1/CD180. Notably, a combined association signal is not only produced by immune-related categories, but also by most other categories and even randomly defined categories. The unspecific quality of these signals limits the possible conclusions from combined association tests. It also reduces the magnitude of enrichment test results. These unspecific signals might result from common variants of small effect and hardly concentrated in candidate categories, or an inflated size of associated regions from weak LD with infrequent mutations

    Exploring the relationship between polymorphic (TG/CA)<sub><it>n </it></sub>repeats in intron 1 regions and gene expression

    No full text
    <p>Abstract</p> <p>The putative role of (TG/CA)<sub><it>n </it></sub>repeats in the regulation of transcription has recently been reported for several cancer- and disease-related genes, including the genes encoding the epidermal growth factor receptor (<it>EGFR</it>), hydroxysteroid (11-beta) dehydrogenase 2 (<it>HSD11B2</it>) and interferon-gamma (<it>IFNG</it>). These studies indicated a correlation between gene expression levels and the presence or length of (TG/CA)<sub><it>n </it></sub>repeats in their intron 1 regions. A genome-wide search for genes with similar features may provide evidence of whether these dinucleotide repeats represent a class of universal regulators of gene expression, which has recently begun to be investigated as a quantitative complex phenotype. Using a public database of simple repeats, we identified 330 genes containing potentially polymorphic long (TG/CA)<sub><it>n </it></sub>repeats (<it>n </it>≥ 12) in their intron 1 regions. One known physiological pathway, the calcium signalling pathway, was found to be enriched among the genes containing long repeats. In addition, certain biological processes, such as cation transport, signal transduction and ion transport, were found to be enriched in these genes. Genotyping of the long repeats showed that the majority of these dinucleotide repeats were polymorphic in the HapMap CEU (Caucasians from Utah, USA) samples of northern and western European ancestry. Evidence for a significant association between these repeats and gene expression was not observed in the genes selected based on their expression profiles in the HapMap CEU samples. Our current findings, therefore, do not support a role for these repeats as a class of universal gene expression regulators. A more comprehensive evaluation of the relationship between these repeats and gene expression, potentially in other tissues, may be necessary to illustrate their roles in gene regulation in the future.</p
    corecore