345 research outputs found
A high resolution genome-wide scan for significant selective sweeps: an application to pooled sequence data in laying chickens
In most studies aimed at localizing footprints of past selection, outliers at tails of the empirical distribution of a given test statistic are assumed to reflect locus-specific selective forces. Significance cutoffs are subjectively determined, rather than being related to a clear set of hypotheses. Here, we define an empirical p-value for the summary statistic by means of a permutation method that uses the observed SNP structure in the real data. To illustrate the methodology, we applied our approach to a panel of 2.9 million autosomal SNPs identified from re-sequencing a pool of 15 individuals from a brown egg layer line. We scanned the genome for local reductions in heterozygosity, suggestive of selective sweeps. We also employed a modified sliding window approach that accounts for gaps in the sequence and increases scanning resolution by moving the overlapping windows by steps of one SNP only, and suggest to call this a "creeping window" strategy. The approach confirmed selective sweeps in the region of previously described candidate genes, i.e. TSHR, PRL, PRLHR, INSR, LEPR, IGF1, and NRAMP1 when used as positive controls. The genome scan revealed 82 distinct regions with strong evidence of selection (genome-wide p-value<0.001), including genes known to be associated with eggshell structure and immune system such as CALB1 and GAL cluster, respectively. A substantial proportion of signals was found in poor gene content regions including the most extreme signal on chromosome 1. The observation of multiple signals in a highly selected layer line of chicken is consistent with the hypothesis that egg production is a complex trait controlled by many genes
Empirical Distributions of F-ST from Large-Scale Human Polymorphism Data
Studies of the apportionment of human genetic variation have long established that most human variation is within population groups and that the additional variation between population groups is small but greatest when comparing different continental populations. These studies often used Wright’s FST that apportions the standardized variance in allele frequencies within and between population groups. Because local adaptations increase population differentiation, high-FST may be found at closely linked loci under selection and used to identify genes undergoing directional or heterotic selection. We re-examined these processes using HapMap data. We analyzed 3 million SNPs on 602 samples from eight worldwide populations and a consensus subset of 1 million SNPs found in all populations. We identified four major features of the data: First, a hierarchically FST analysis showed that only a paucity (12%) of the total genetic variation is distributed between continental populations and even a lesser genetic variation (1%) is found between intra-continental populations. Second, the global FST distribution closely follows an exponential distribution. Third, although the overall FST distribution is similarly shaped (inverse J), FST distributions varies markedly by allele frequency when divided into non-overlapping groups by allele frequency range. Because the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. Finally, the change in mean-FST of these groups is linear in allele frequency. These results suggest that investigating the extremes of the FST distribution for each allele frequency group is more efficient for detecting selection. Consequently, we demonstrate that such extreme SNPs are more clustered along the chromosomes than expected from linkage disequilibrium for each allele frequency group. These genomic regions are therefore likely candidates for natural selection
Evidence for Pervasive Adaptive Protein Evolution in Wild Mice
The relative contributions of neutral and adaptive substitutions to molecular evolution has been one of the most controversial issues in evolutionary biology for more than 40 years. The analysis of within-species nucleotide polymorphism and between-species divergence data supports a widespread role for adaptive protein evolution in certain taxa. For example, estimates of the proportion of adaptive amino acid substitutions (alpha) are 50% or more in enteric bacteria and Drosophila. In contrast, recent estimates of alpha for hominids have been at most 13%. Here, we estimate alpha for protein sequences of murid rodents based on nucleotide polymorphism data from multiple genes in a population of the house mouse subspecies Mus musculus castaneus, which inhabits the ancestral range of the Mus species complex and nucleotide divergence between M. m. castaneus and M. famulus or the rat. We estimate that 57% of amino acid substitutions in murids have been driven by positive selection. Hominids, therefore, are exceptional in having low apparent levels of adaptive protein evolution. The high frequency of adaptive amino acid substitutions in wild mice is consistent with their large effective population size, leading to effective natural selection at the molecular level. Effective natural selection also manifests itself as a paucity of effectively neutral nonsynonymous mutations in M. m. castaneus compared to humans
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies.
FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects.
DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects.
METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data
New evidence for habitat specific selection in Wadden Sea Zostera marina populations revealed by genome scanning using SNP and microsatellite markers
Eelgrass Zostera marina is an ecosystem-engineering species of outstanding importance for coastal soft sediment habitats that lives in widely diverging habitats. Our first goal was to detect divergent selection and habitat adaptation at the molecular genetic level; hence, we compared three pairs of permanently submerged versus intertidal populations using genome scans, a genetic marker-based approach. Three different statistical approaches for outlier identification revealed divergent selection at 6 loci among 46 markers (6 SNPs, 29 EST microsatellites and 11 anonymous microsatellites). These outlier loci were repeatedly detected in parallel habitat comparisons, suggesting the influence of habitat-specific selection. A second goal was to test the consistency of the general genome scan approach by doubling the number of gene-linked microsatellites and adding single nucleotide polymorphism (SNP) loci, a novel marker type for seagrasses, compared to a previous study. Reassuringly, results with respect to selection were consistent among most marker loci. Functionally interesting marker loci were linked to genes involved in osmoregulation and water balance, suggesting different osmotic stress, and reproductive processes (seed maturation), pointing to different life history strategies. The identified outlier loci are valuable candidates for further investigation into the genetic basis of natural selection
Nucleocytoplasmic transport: a thermodynamic mechanism
The nuclear pore supports molecular communication between cytoplasm and
nucleus in eukaryotic cells. Selective transport of proteins is mediated by
soluble receptors, whose regulation by the small GTPase Ran leads to cargo
accumulation in, or depletion from the nucleus, i.e., nuclear import or nuclear
export. We consider the operation of this transport system by a combined
analytical and experimental approach. Provocative predictions of a simple model
were tested using cell-free nuclei reconstituted in Xenopus egg extract, a
system well suited to quantitative studies. We found that accumulation capacity
is limited, so that introduction of one import cargo leads to egress of
another. Clearly, the pore per se does not determine transport directionality.
Moreover, different cargo reach a similar ratio of nuclear to cytoplasmic
concentration in steady-state. The model shows that this ratio should in fact
be independent of the receptor-cargo affinity, though kinetics may be strongly
influenced. Numerical conservation of the system components highlights a
conflict between the observations and the popular concept of transport cycles.
We suggest that chemical partitioning provides a framework to understand the
capacity to generate concentration gradients by equilibration of the
receptor-cargo intermediary.Comment: in press at HFSP Journal, vol 3 16 text pages, 1 table, 4 figures,
plus Supplementary Material include
Analysis of genome-wide structure, diversity and fine mapping of Mendelian traits in traditional and village chickens
Extensive phenotypic variation is a common feature among village chickens found throughout much of the developing world, and in traditional chicken breeds that have been artificially selected for traits such as plumage variety. We present here an assessment of traditional and village chicken populations, for fine mapping of Mendelian traits using genome-wide single-nucleotide polymorphism (SNP) genotyping while providing information on their genetic structure and diversity. Bayesian clustering analysis reveals two main genetic backgrounds in traditional breeds, Kenyan, Ethiopian and Chilean village chickens. Analysis of linkage disequilibrium (LD) reveals useful LD (r(2)⩾0.3) in both traditional and village chickens at pairwise marker distances of ∼10 Kb; while haplotype block analysis indicates a median block size of 11–12 Kb. Association mapping yielded refined mapping intervals for duplex comb (Gga 2:38.55–38.89 Mb) and rose comb (Gga 7:18.41–22.09 Mb) phenotypes in traditional breeds. Combined mapping information from traditional breeds and Chilean village chicken allows the oocyan phenotype to be fine mapped to two small regions (Gga 1:67.25–67.28 Mb, Gga 1:67.28–67.32 Mb) totalling ∼75 Kb. Mapping the unmapped earlobe pigmentation phenotype supports previous findings that the trait is sex-linked and polygenic. A critical assessment of the number of SNPs required to map simple traits indicate that between 90 and 110K SNPs are required for full genome-wide analysis of haplotype block structure/ancestry, and for association mapping in both traditional and village chickens. Our results demonstrate the importance and uniqueness of phenotypic diversity and genetic structure of traditional chicken breeds for fine-scale mapping of Mendelian traits in the species, with village chicken populations providing further opportunities to enhance mapping resolutions
Functional impact and evolution of a novel human polymorphic inversion that disrupts a gene and creates a fusion transcript
Since the discovery of chromosomal inversions almost 100 years ago, how they are maintained in natural populations has been a highly debated issue. One of the hypotheses is that inversion breakpoints could affect genes and modify gene expression levels, although evidence of this came only from laboratory mutants. In humans, a few inversions have been shown to associate with expression differences, but in all cases the molecular causes have remained elusive. Here, we have carried out a complete characterization of a new human polymorphic inversion and determined that it is specific to East Asian populations. In addition, we demonstrate that it disrupts the ZNF257 gene and, through the translocation of the first exon and regulatory sequences, creates a previously nonexistent fusion transcript, which together are associated to expression changes in several other genes. Finally, we investigate the potential evolutionary and phenotypic consequences of the inversion, and suggest that it is probably deleterious. This is therefore the first example of a natural polymorphic inversion that has position effects and creates a new chimeric gene, contributing to answer an old question in evolutionary biology
Association of melanocortin 1 receptor gene (MC1R) polymorphisms with skin reflectance and freckles in Japanese.
Most studies on the genetic basis of human skin pigmentation have focused on people of European ancestry and only a few studies have focused on Asian populations. We investigated the association of skin reflectance and freckling with genetic variants of melanocortin 1 receptor (MC1R) gene in Japanese. DNA samples were obtained from a total of 653 Japanese individuals (ages 19-40 years) residing in Okinawa; skin reflectance was measured using a spectrophotometer and freckling status was determined for each individual. Lightness index (L*) and freckling status were not correlated with age, body mass index or ancestry (Ryukyuan or Main Islanders of Japan). Among the 10 nonsynonymous variants that were identified by direct sequencing of the coding region of MC1R, two variants--R163Q and V92M--with the derived allele frequencies of 78.6 and 5.5%, respectively, were most common. Multiple regression analysis showed that the 163Q allele and the presence of nonsynonymous rare variants (allele frequencies <5%) were significantly associated with an increase in sex-standardized skin lightness (L* of CIELAB (CIE 1976 (L*a*b*) color space)) of the inner upper arm. Relative to the 92V allele, the 92M allele was significantly associated with increased odds of freckling. This is the first study to show an association between the 163Q allele and skin reflectance values; this association indicated that light-toned skin may have been subjected to positive selection in East Asian people
Melanocortin-1 Receptor, Skin Cancer and Phenotypic Characteristics (M-SKIP) Project: Study Design and Methods for Pooling Results of Genetic Epidemiological Studies
Background: For complex diseases like cancer, pooled-analysis of individual data represents a powerful tool to investigate the joint contribution of genetic, phenotypic and environmental factors to the development of a disease. Pooled-analysis of epidemiological studies has many advantages over meta-analysis, and preliminary results may be obtained faster and with lower costs than with prospective consortia. Design and methods: Based on our experience with the study design of the Melanocortin-1 receptor (MC1R) gene, SKin cancer and Phenotypic characteristics (M-SKIP) project, we describe the most important steps in planning and conducting a pooled-analysis of genetic epidemiological studies. We then present the statistical analysis plan that we are going to apply, giving particular attention to methods of analysis recently proposed to account for between-study heterogeneity and to explore the joint contribution of genetic, phenotypic and environmental factors in the development of a disease. Within the M-SKIP project, data on 10,959 skin cancer cases and 14,785 controls from 31 international investigators were checked for quality and recoded for standardization. We first proposed to fit the aggregated data with random-effects logistic regression models. However, for the M-SKIP project, a two-stage analysis will be preferred to overcome the problem regarding the availability of different study covariates. The joint contribution of MC1R variants and phenotypic characteristics to skin cancer development will be studied via logic regression modeling. Discussion: Methodological guidelines to correctly design and conduct pooled-analyses are needed to facilitate application of such methods, thus providing a better summary of the actual findings on specific fields
- …
