13 research outputs found
biMM : efficient estimation of genetic variances and covariances for cohorts with high-dimensional phenotype measurements
Genetic research utilizes a decomposition of trait variances and covariances into genetic and environmental parts. Our software package biMM is a computationally efficient implementation of a bivariate linear mixed model for settings where hundreds of traits have been measured on partially overlapping sets of individuals.Peer reviewe
Next generation analytic tools for large scale genetic epidemiology studies of complex diseases
Over the past several years, genomeâwide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled âNext Generation Analytic Tools for LargeâScale Genetic Epidemiology Studies of Complex Diseasesâ on September 15â16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of largeâscale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (geneâgene and geneâenvironment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized. Genet. Epidemiol . 36 : 22â35, 2012. © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/93578/1/gepi20652.pd
Assessing the genetic overlap between BMI and cognitive function
Obesity and low cognitive function are associated with multiple adverse health outcomes across the life course. They have a small phenotypic correlation (r=-0.11; high body mass index (BMI)-low cognitive function), but whether they have a shared genetic aetiology is unknown. We investigated the phenotypic and genetic correlations between the traits using data from 6815 unrelated, genotyped members of Generation Scotland, an ethnically homogeneous cohort from five sites across Scotland. Genetic correlations were estimated using the following: same-sample bivariate genome-wide complex trait analysis (GCTA)-GREML; independent samples bivariate GCTA-GREML using Generation Scotland for cognitive data and four other samples (n=20 806) for BMI; and bivariate LDSC analysis using the largest genome-wide association study (GWAS) summary data on cognitive function (n=48 462) and BMI (n=339 224) to date. The GWAS summary data were also used to create polygenic scores for the two traits, with within- and cross-trait prediction taking place in the independent Generation Scotland cohort. A large genetic correlation of -0.51 (s.e. 0.15) was observed using the same-sample GCTA-GREML approach compared with -0.10 (s.e. 0.08) from the independent-samples GCTA-GREML approach and -0.22 (s.e. 0.03) from the bivariate LDSC analysis. A genetic profile score using cognition-specific genetic variants accounts for 0.08% (P=0.020) of the variance in BMI and a genetic profile score using BMI-specific variants accounts for 0.42% (P=1.9 Ă 10 -7) of the variance in cognitive function. Seven common genetic variants are significantly associated with both traits at
REHH 2.0: a reimplementation of the R package REHH to detect positive selection from haplotype structure
Identifying genomic regions with unusually high local haplotype homozygosity represents a powerful strategy to characterize candidate genes responding to natural or artificial positive selection. To that end, statistics measuring the extent of haplotype homozygosity within (e.g. EHH, iHS) and between (Rsb or XP-EHH) populations have been proposed in the literature. The REHH package for R was previously developed to facilitate genome-wide scans of selection, based on the analysis of long-range haplotypes. However, its performance was not sufficient to cope with the growing size of available data sets. Here, we propose a major upgrade of the REHH package, which includes an improved processing of the input files, a faster algorithm to enumerate haplotypes, as well as multithreading. As illustrated with the analysis of large human haplotype data sets, these improvements decrease the computation time by more than one order of magnitude. This new version of REHH will thus allow performing iHS-, Rsb-or XP-EHH-based scans on large data sets. The package REHH 2.0 is available from the CRAN repository (http://cran.r-project.org/web/packages/rehh/index.html) together with help files and a detailed manual
Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets
Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow
Overview of the human genome
The human genome is composed of deoxyribonucleic acid (DNA) organized into 23 pairs of chromosomes in the nucleus of human cells, as well as the small DNA found inside individual mitochondria. Complete sequencing of the 3 billion base pairs that make up the human genome has made available a deluge of information that has enhanced our understanding of evolution, physiology, causality of disease, and association between heredity and environment in humans. This chapter discusses discoveries in genetics that spawned the field of human genomics. It further highlights the role of human genome in disease susceptibility, as well as its prospects for the future of healthcare