2,240 research outputs found
Estimating Effects and Making Predictions from Genome-Wide Marker Data
In genome-wide association studies (GWAS), hundreds of thousands of genetic
markers (SNPs) are tested for association with a trait or phenotype. Reported
effects tend to be larger in magnitude than the true effects of these markers,
the so-called ``winner's curse.'' We argue that the classical definition of
unbiasedness is not useful in this context and propose to use a different
definition of unbiasedness that is a property of the estimator we advocate. We
suggest an integrated approach to the estimation of the SNP effects and to the
prediction of trait values, treating SNP effects as random instead of fixed
effects. Statistical methods traditionally used in the prediction of trait
values in the genetics of livestock, which predates the availability of SNP
data, can be applied to analysis of GWAS, giving better estimates of the SNP
effects and predictions of phenotypic and genetic values in individuals.Comment: Published in at http://dx.doi.org/10.1214/09-STS306 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Large autosomal copy-number differences within unselected monozygotic twin pairs are rare
Monozygotic (MZ) twins form an important system for the study of biological plasticity in humans. While MZ twins are generally considered to be genetically identical, a number of studies have emerged that have demonstrated copy-number differences within a twin pair, particularly in those discordant for disease. The rate of autosomal copy-number variation (CNV) discordance within MZ twin pairs was investigated using a population sample of 376 twin pairs genotyped on Illumina Human610-Quad arrays. After CNV calling using both QuantiSNP and PennCNV followed by manual annotation, only a single CNV difference was observed within the MZ twin pairs, being a 130 KB duplication of chromosome 5. Five other potential discordant CNV were called by the software, but excluded based on manual annotation of the regions. It is concluded that large CNV discordance is rare within MZ twin pairs, indicating that any CNV difference found within phenotypically discordant MZ twin pairs has a high probability of containing the causal gene(s) involved
Prediction of individual genetic risk to disease from genome-wide association studies
Empirical studies suggest that the effect sizes of individual causal risk alleles underlying complex genetic diseases are small, with most genotype relative risks in the range of 1.1-2.0. Although the increased risk of disease for a carrier is small for any single locus, knowledge of multiple-risk alleles throughout the genome could allow the identification of individuals that are at high risk. In this study, we investigate the number and effect size of risk loci that underlie complex disease constrained by the disease parameters of prevalence and heritability. Then we quantify the value of prediction of genetic risk to disease using a range of realistic combinations of the number, size, and distribution of risk effects that underlie complex diseases. We propose an approach to assess the genetic risk of a disease in healthy individuals, based on dense genome-wide SNP panels. We test this approach using simulation. When the number of loci contributing to the disease is >50, a large case-control study is needed to identify a set of risk loci for use in predicting the disease risk of healthy people not included in the case-control study. For diseases controlled by 1000 loci of mean relative risk of only 1.04, a case-control study with 10,000 cases and controls can lead to selection of ∼75 loci that explain >50% of the genetic variance. The 5% of people with the highest predicted risk are three to seven times more likely to suffer the disease than the population average, depending on heritability and disease prevalence. Whether an individual with known genetic risk develops the disease depends on known and unknown environmental factors
Genetic architecture of body size in mammals
Much of the heritability for human stature is caused by mutations of small-to-medium effect. This is because detrimental pleiotropy restricts large-effect mutations to very low frequencies
Explaining additional genetic variation in complex traits
Genome-wide association studies (GWAS) have provided valuable insights into the genetic basis of complex traits, discovering >6000 variants associated with >500 quantitative traits and common complex diseases in humans. The associations identified so far represent only a fraction of those that influence phenotype, because there are likely to be many variants across the entire frequency spectrum, each of which influences multiple traits, with only a small average contribution to the phenotypic variance. This presents a considerable challenge to further dissection of the remaining unexplained genetic variance within populations, which limits our ability to predict disease risk, identify new drug targets, improve and maintain food sources, and understand natural diversity. This challenge will be met within the current framework through larger sample size, better phenotyping, including recording of nongenetic risk factors, focused study designs, and an integration of multiple sources of phenotypic and genetic information. The current evidence supports the application of quantitative genetic approaches, and we argue that one should retain simpler theories until simplicity can be traded for greater explanatory power
The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis
It was shown recently using experimental data that it is possible under certain conditions to determine whether a person with known genotypes at a number of markers was part of a sample from which only allele frequencies are known. Using population genetic and statistical theory, we show that the power of such identification is, approximately, proportional to the number of independent SNPs divided by the size of the sample from which the allele frequencies are available. We quantify the limits of identification and propose likelihood and regression analysis methods for the analysis of data. We show that these methods have similar statistical properties and have more desirable properties, in terms of type-I error rate and statistical power, than test statistics suggested in the literature
Large-scale genomics unveils the genetic architecture of psychiatric disorders
Family study results are consistent with genetic effects making substantial contributions to risk of psychiatric disorders such as schizophrenia, yet robust identification of specific genetic variants that explain variation in population risk had been disappointing until the advent of technologies that assay the entire genome in large samples. We highlight recent progress that has led to a better understanding of the number of risk variants in the population and the interaction of allele frequency and effect size. The emerging genetic architecture implies a large number of contributing loci (that is, a high genome-wide mutational target) and suggests that genetic risk of psychiatric disorders involves the combined effects of many common variants of small effect, as well as rare and de novo variants of large effect. The capture of a substantial proportion of genetic risk facilitates new study designs to investigate the combined effects of genes and the environment
The epigenetic clock is correlated with physical and cognitive fitness in the Lothian Birth Cohort 1936
Background: The DNA methylation-based 'epigenetic clock' correlates strongly with chronological age, but it is currently unclear what drives individual differences. We examine cross-sectional and longitudinal associations between the epigenetic clock and four mortality-linked markers of physical and mental fitness: lung function, walking speed, grip strength and cognitive ability. Methods: DNA methylation-based age acceleration (residuals of the epigenetic clock estimate regressed on chronological age) were estimated in the Lothian Birth Cohort 1936 at ages 70 (n=920), 73 (n=299) and 76 (n=273) years. General cognitive ability, walking speed, lung function and grip strength were measured concurrently. Cross-sectional correlations between age acceleration and the fitness variables were calculated. Longitudinal change in the epigenetic clock estimates and the fitness variables were assessed via linear mixed models and latent growth curves. Epigenetic age acceleration at age 70 was used as a predictor of longitudinal change in fitness. Epigenome-wide association studies (EWASs) were conducted on the four fitness measures. Results: Cross-sectional correlations were significant between greater age acceleration and poorer performance on the lung function, cognition and grip strength measures (r range: -0.07 to -0.05, P range: 9.7 x 10 to 0.024). All of the fitness variables declined over time but age acceleration did not correlate with subsequent change over 6 years. There were no EWAS hits for the fitness traits. Conclusions: Markers of physical and mental fitness are associated with the epigenetic clock (lower abilities associated with age acceleration). However, age acceleration does not associate with decline in these measures, at least over a relatively short follow-up
- …
