109 research outputs found

    Identifying cryptic population structure in multigenerational pedigrees in a Mexican American sample

    Get PDF
    Cryptic population structure can increase both type I and type II errors. This is particularly problematic in case-control association studies of unrelated individuals. Some researchers believe that these problems are obviated in families. We argue here that this may not be the case, especially if families are drawn from a known admixed population such as Mexican Americans. We use a principal component approach to evaluate and visualize the results of three different approaches to searching for cryptic structure in the 20 multigenerational families of the Genetic Analysis Workshop 18 (GAW18). Approach 1 uses all family members in the sample to identify what might be considered "outlier" kindreds. Because families are likely to differ in size (in the GAW18 families, there is about a 4-fold difference in the number of typed individuals), approach 2 uses a weighting system that equalizes pedigree size. Approach 3 concentrates on the founders and the "marry-ins" because, in principle, the entire pedigree can be reconstructed with knowledge of the sequence of these unrelated individuals and genome-wide association study (GWAS) data on everyone else (to identify the position of recombinations). We demonstrate that these three approaches can yield very different insights about cryptic structure in a sample of families

    Linkage and association analyses of principal components in expression data

    Get PDF
    Performing linkage and association analyses on a large set of correlated data presents an interesting set of problems. In the current setting, we have 3554 expression levels from lymphoblastoid cell lines in 194 individuals from 14 three-generation Utah CEPH (Centre d'Etude du Polymorphisme Humain) pedigrees. We formed multivariate expression phenotypes from six sets of genes. These consisted of a set of genes identified by the data providers as showing common linkage to a region of chromosome 14, as well as five other sets suggested by ontological evidence. Using principal-component analyses, we generated seven quantitative phenotypes for expression levels from these six sets of genes. We performed quantitative genome linkage screens on these traits using the expression traits from the third generation of each pedigree. As expected, the strongest linkage signal was achieved when the trait under analysis was the composite of the expressions of genes previously showing linkage to chromosome 14. In particular, this trait produced a LOD score of 5.2 on chromosome 14. The trait also produced LOD scores over 3.5 on chromosomes 1, 7, 9, and 11; this suggests that these genes may be controlled by additional genetic factors on the genome. Subsequent association analyses on the first two generations of these pedigrees identified two polymorphisms on chromosome 11 as significant after correcting for multiple tests. These results suggest that principal-component analyses are useful for the analysis of pleiotropic loci. Furthermore, we have identified two single-nucleotide polymorphisms that may influence the expression of multiple genes linked to chromosome 14

    Detecting population stratification using related individuals

    Get PDF
    Although identification of cryptic population stratification is necessary for case/control association analyses, it is also vital for linkage analyses and family-based association tests when founder genotypes are missing. However, including related individuals in an analysis such as EIGENSTRAT can result in bias; using only founders or one individual per pedigree results in loss of data and inaccurate estimates of stratification. We examine a generalization of principal-component analyses to allow for the inclusion of related individuals by down-weighting the significance of individual comparisons

    Stratify or adjust? Dealing with multiple populations when evaluating rare variants

    Get PDF
    The unrelated individuals sample from Genetic Analysis Workshop 17 consists of a small number of subjects from eight population samples and genetic data composed mostly of rare variants. We compare two simple approaches to collapsing rare variants within genes for their utility in identifying genes that affect phenotype. We also compare results from stratified analyses to those from a pooled analysis that uses ethnicity as a covariate. We found that the two collapsing approaches were similarly effective in identifying genes that contain causative variants in these data. However, including population as a covariate was not an effective substitute for analyzing the subpopulations separately when only one subpopulation contained a rare variant linked to the phenotype

    Power and false-positive rates for the restricted partition method (RPM) in a large candidate gene data set

    Get PDF
    Many phenotypes of public health importance (e.g., diabetes, coronary artery disease, major depression, obesity, and addictions to alcohol and nicotine) involve complex pathways of action. Interactions between genetic variants or between genetic variants and environmental factors likely play important roles in the functioning of these pathways. Unfortunately, complex interacting systems are likely to have important interacting factors that may not readily reveal themselves to univariate analyses. Instead, detecting the role of some of these factors may require analyses that are sensitive to interaction effects. In this study, we evaluate the sensitivity and specificity of the restricted partition method (RPM) to detect signals related to coronary artery disease in the Genetic Analysis Workshop 16 Problem 3 data using the 50,000 k candidate gene single-nucleotide polymorphism set. Power and false-positive rates were evaluated using the first 100 replicate datasets. This included an exploration of the utility of using of all genotyped family members compared with selecting one member per family

    Microsatellites versus single-nucleotide polymorphisms in linkage analysis for quantitative and qualitative measures

    Get PDF
    BACKGROUND: Genetic maps based on single-nucleotide polymorphisms (SNP) are increasingly being used as an alternative to microsatellite maps. This study compares linkage results for both types of maps for a neurophysiology phenotype and for an alcohol dependence phenotype. Our analysis used two SNP maps on the Illumina and Affymetrix platforms. We also considered the effect of high linkage disequilibrium (LD) in regions near the linkage peaks by analysing a "sparse" SNP map obtained by dropping some markers in high LD with other markers in those regions. RESULTS: The neurophysiology phenotype at the main linkage peak near 130 MB gave LOD scores of 2.76, 2.53, 3.22, and 2.68 for the microsatellite, Affymetrix, Illumina, and Illumina-sparse maps, respectively. The alcohol dependence phenotype at the main linkage peak near 101 MB gave LOD scores of 3.09, 3.69, 4.08, and 4.11 for the microsatellite, Affymetrix, Illumina, and Illumina-sparse maps, respectively. CONCLUSION: The linkage results were stronger overall for SNPs than for microsatellites for both phenotypes. However, LOD scores may be artificially elevated in regions of high LD. Our analysis indicates that appropriately thinning a SNP map in regions of high LD should give more accurate LOD scores. These results suggest that SNPs can be an efficient substitute for microsatellites for linkage analysis of both quantitative and qualitative phenotypes

    Multipoint identity-by-descent computations for single-point polymorphism and microsatellite maps

    Get PDF
    We used the LOKI software to generate multipoint identity-by-descent matrices for a microsatellite map (with 31 markers) and two single-nucleotide polymorphism (SNP) maps to examine information content across chromosome 7 in the Collaborative Study on the Genetics of Alcoholism dataset. Despite the lower information provided by a single SNP, SNP maps overall had higher and more uniform information content across the chromosome. The Affymetrix map (578 SNPs) and the Illumina map (271 SNPs) provided almost identical information. However, increased information has a computational cost: SNP maps require 100 times as many iterations as microsatellites to produce stable estimates
    • …
    corecore