418 research outputs found

    Selection for Translation Efficiency on Synonymous Polymorphisms in Recent Human Evolution

    Get PDF
    Synonymous mutations are considered to be “silent” as they do not affect protein sequence. However, different silent codons have different translation efficiency (TE), which raises the question to what extent such mutations are really neutral. We perform the first genome-wide study of natural selection operating on TE in recent human evolution, surveying 13,798 synonymous single nucleotide polymorphisms (SNPs) in 1,198 unrelated individuals from 11 populations. We find evidence for both negative and positive selection on TE, as measured based on differentiation in allele frequencies between populations. Notably, the likelihood of an SNP to be targeted by positive or negative selection is correlated with the magnitude of its effect on the TE of the corresponding protein. Furthermore, negative selection acting against changes in TE is more marked in highly expressed genes, highly interacting proteins, complex members, and regulatory genes. It is also more common in functional regions and in the initial segments of highly expressed genes. Positive selection targeting sites with a large effect on TE is stronger in lowly interacting proteins and in regulatory genes. Similarly, essential genes are enriched for negative TE selection while underrepresented for positive TE selection. Taken together, these results point to the significant role of TE as a selective force operating in humans and hence underscore the importance of considering silent SNPs in interpreting associations with complex human diseases. Testifying to this potential, we describe two synonymous SNPs that may have clinical implications in phenylketonuria and in Best's macular dystrophy due to TE differences between alleles

    A prospective study of serum insulin-like growth factor-I (IGF-I), IGF-II, IGF-binding protein-3 and breast cancer risk.

    Get PDF
    The associations between serum concentrations of insulin-like growth factor-I (IGF-I), IGF-II and IGF-binding proteins (IGFBP)-3 and risk of breast cancer were investigated in a nested case-control study involving 117 cases (70 premenopausal and 47 postmenopausal at blood collection) and 350 matched controls within a cohort of women from the island of Guernsey, UK. Women using exogenous hormones at the time of blood collection were excluded. Premenopausal women in the top vs bottom third of serum IGF-I concentration had a nonsignificantly increased risk for breast cancer after adjustment for IGFBP-3 (odds ratio (OR) 1.71; 95% confidence interval (CI): 0.74-3.95; test for linear trend, P=0.21). Serum IGFBP-3 was associated with a reduction in risk in premenopausal women after adjustment for IGF-I (top third vs the bottom third: OR 0.49; 95% CI: 0.21-1.12, P for trend=0.07). Neither IGF-I nor IGFBP-3 was associated with risk in postmenopausal women and serum IGF-II concentration was not associated with risk in pre- or postmenopausal women. These data are compatible with the hypothesis that premenopausal women with a relatively high circulating concentration of IGF-I and low IGFBP-3 are at an increased risk of developing breast cancer

    Human Population Differentiation Is Strongly Correlated with Local Recombination Rate

    Get PDF
    Allele frequency differences across populations can provide valuable information both for studying population structure and for identifying loci that have been targets of natural selection. Here, we examine the relationship between recombination rate and population differentiation in humans by analyzing two uniformly-ascertained, whole-genome data sets. We find that population differentiation as assessed by inter-continental FST shows negative correlation with recombination rate, with FST reduced by 10% in the tenth of the genome with the highest recombination rate compared with the tenth of the genome with the lowest recombination rate (P≪10−12). This pattern cannot be explained by the mutagenic properties of recombination and instead must reflect the impact of selection in the last 100,000 years since human continental populations split. The correlation between recombination rate and FST has a qualitatively different relationship for FST between African and non-African populations and for FST between European and East Asian populations, suggesting varying levels or types of selection in different epochs of human history

    Composite likelihood estimation of demographic parameters

    Get PDF
    which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background: Most existing likelihood-based methods for fitting historical demographic models to DNA sequence polymorphism data to do not scale feasibly up to the level of whole-genome data sets. Computational economies can be achieved by incorporating two forms of pseudo-likelihood: composite and approximate likelihood methods. Composite likelihood enables scaling up to large data sets because it takes the product of marginal likelihoods as an estimator of the likelihood of the complete data set. This approach is especially useful when a large number of genomic regions constitutes the data set. Additionally, approximate likelihood methods can reduce the dimensionality of the data by summarizing the information in the original data by either a sufficient statistic, or a set of statistics. Both composite and approximate likelihood methods hold promise for analyzing large data sets or for use in situations where the underlying demographic model is complex and has many parameters. This paper considers a simple demographic model of allopatric divergence between two populations, in which one of the population is hypothesized to have experienced a founder event, or population bottleneck. A large resequencing data set from human populations is summarized by the joint frequency spectrum, which is a matrix of the genomic frequency spectrum of derived base frequencies in two populations. A Bayesia

    Causal Measures of Structure and Plasticity in Simulated and Living Neural Networks

    Get PDF
    A major goal of neuroscience is to understand the relationship between neural structures and their function. Recording of neural activity with arrays of electrodes is a primary tool employed toward this goal. However, the relationships among the neural activity recorded by these arrays are often highly complex making it problematic to accurately quantify a network's structural information and then relate that structure to its function. Current statistical methods including cross correlation and coherence have achieved only modest success in characterizing the structural connectivity. Over the last decade an alternative technique known as Granger causality is emerging within neuroscience. This technique, borrowed from the field of economics, provides a strong mathematical foundation based on linear auto-regression to detect and quantify “causal” relationships among different time series. This paper presents a combination of three Granger based analytical methods that can quickly provide a relatively complete representation of the causal structure within a neural network. These are a simple pairwise Granger causality metric, a conditional metric, and a little known computationally inexpensive subtractive conditional method. Each causal metric is first described and evaluated in a series of biologically plausible neural simulations. We then demonstrate how Granger causality can detect and quantify changes in the strength of those relationships during plasticity using 60 channel spike train data from an in vitro cortical network measured on a microelectrode array. We show that these metrics can not only detect the presence of causal relationships, they also provide crucial information about the strength and direction of that relationship, particularly when that relationship maybe changing during plasticity. Although we focus on the analysis of multichannel spike train data the metrics we describe are applicable to any stationary time series in which causal relationships among multiple measures is desired. These techniques can be especially useful when the interactions among those measures are highly complex, difficult to untangle, and maybe changing over time

    The Impact of Divergence Time on the Nature of Population Structure: An Example from Iceland

    Get PDF
    The Icelandic population has been sampled in many disease association studies, providing a strong motivation to understand the structure of this population and its ramifications for disease gene mapping. Previous work using 40 microsatellites showed that the Icelandic population is relatively homogeneous, but exhibits subtle population structure that can bias disease association statistics. Here, we show that regional geographic ancestries of individuals from Iceland can be distinguished using 292,289 autosomal single-nucleotide polymorphisms (SNPs). We further show that subpopulation differences are due to genetic drift since the settlement of Iceland 1100 years ago, and not to varying contributions from different ancestral populations. A consequence of the recent origin of Icelandic population structure is that allele frequency differences follow a null distribution devoid of outliers, so that the risk of false positive associations due to stratification is minimal. Our results highlight an important distinction between population differences attributable to recent drift and those arising from more ancient divergence, which has implications both for association studies and for efforts to detect natural selection using population differentiation

    Estimation of allele frequency and association mapping using next-generation sequencing data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15<it>X</it>). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.</p> <p>Results</p> <p>We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.</p> <p>Conclusions</p> <p>Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.</p

    Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low <it>p</it>-value. However, the interpretation of each single <it>p</it>-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, <it>game theory </it>has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions.</p> <p>Results</p> <p>In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called <it>Comparative Analysis of Shapley value </it>(shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability.</p> <p>Conclusion</p> <p>CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways.</p
    corecore