335 research outputs found

    Normalization of microarray expression data using within-pedigree pool and its effect on linkage analysis

    Get PDF
    "Genetical genomics", the study of natural genetic variation combining data from genetic marker-based studies with gene expression analyses, has exploded with the recent development of advanced microarray technologies. To account for systematic variation known to exist in microarray data, it is critical to properly normalize gene expression traits before performing genetic linkage analyses. However, imposing equal means and variances across pedigrees can over-correct for the true biological variation by ignoring familial correlations in expression values. We applied the robust multiarray average (RMA) method to gene expression trait data from 14 Centre d'Etude du Polymorphisme Humain (CEPH) Utah pedigrees provided by GAW15 (Genetic Analysis Workshop 15). We compared the RMA normalization method using within-pedigree pools to RMA normalization using all individuals in a single pool, which ignores pedigree membership, and investigated the effects of these different methods on 18 gene expression traits previously found to be linked to regions containing the corresponding structural locus. Familial correlation coefficients of the expressed traits were stronger when traits were normalized within pedigrees. Surprisingly, the linkage plots for these traits were similar, suggesting that although heritability increases when traits are normalized within pedigrees, the strength of linkage evidence does not necessarily change substantially

    Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>By assaying hundreds of thousands of single nucleotide polymorphisms, genome wide association studies (GWAS) allow for a powerful, unbiased review of the entire genome to localize common genetic variants that influence health and disease. Although it is widely recognized that some correction for multiple testing is necessary, in order to control the family-wide Type 1 Error in genetic association studies, it is not clear which method to utilize. One simple approach is to perform a Bonferroni correction using all <it>n single nucleotide polymorphisms (</it>SNPs) across the genome; however this approach is highly conservative and would "overcorrect" for SNPs that are not truly independent. Many SNPs fall within regions of strong linkage disequilibrium (LD) ("blocks") and should not be considered "independent".</p> <p>Results</p> <p>We proposed to approximate the number of "independent" SNPs by counting 1 SNP per LD block, plus all SNPs outside of blocks (interblock SNPs). We examined the <it>effective </it>number of independent SNPs for Genome Wide Association Study (GWAS) panels. In the CEPH Utah (CEU) population, by considering the interdependence of SNPs, we could reduce the total number of effective tests within the Affymetrix and Illumina SNP panels from 500,000 and 317,000 to 67,000 and 82,000 "independent" SNPs, respectively. For the Affymetrix 500 K and Illumina 317 K GWAS SNP panels we recommend using 10<sup>-5</sup>, 10<sup>-7 </sup>and 10<sup>-8 </sup>and for the Phase II HapMap CEPH Utah and Yoruba populations we recommend using 10<sup>-6</sup>, 10<sup>-7 </sup>and 10<sup>-9 </sup>as "suggestive", "significant" and "highly significant" p-value thresholds to properly control the family-wide Type 1 error.</p> <p>Conclusion</p> <p>By approximating the effective number of independent SNPs across the genome we are able to 'correct' for a more accurate number of tests and therefore develop 'LD adjusted' Bonferroni corrected p-value thresholds that account for the interdepdendence of SNPs on well-utilized commercially available SNP "chips". These thresholds will serve as guides to researchers trying to decide which regions of the genome should be studied further.</p

    Allele frequency misspecification: effect on power and Type I error of model-dependent linkage analysis of quantitative traits under random ascertainment

    Get PDF
    BACKGROUND: Studies of model-based linkage analysis show that trait or marker model misspecification leads to decreasing power or increasing Type I error rate. An increase in Type I error rate is seen when marker related parameters (e.g., allele frequencies) are misspecified and ascertainment is through the trait, but lod-score methods are expected to be robust when ascertainment is random (as is often the case in linkage studies of quantitative traits). In previous studies, the power of lod-score linkage analysis using the "correct" generating model for the trait was found to increase when the marker allele frequencies were misspecified and parental data were missing. An investigation of Type I error rates, conducted in the absence of parental genotype data and with misspecification of marker allele frequencies, showed that an inflation in Type I error rate was the cause of at least part of this apparent increased power. To investigate whether the observed inflation in Type I error rate in model-based LOD score linkage was due to sampling variation, the trait model was estimated from each sample using REGCHUNT, an automated segregation analysis program used to fit models by maximum likelihood using many different sets of initial parameter estimates. RESULTS: The Type I error rates observed using the trait models generated by REGCHUNT were usually closer to the nominal levels than those obtained when assuming the generating trait model. CONCLUSION: This suggests that the observed inflation of Type I error upon misspecification of marker allele frequencies is at least partially due to sampling variation. Thus, with missing parental genotype data, lod-score linkage is not as robust to misspecification of marker allele frequencies as has been commonly thought

    Application of the propensity score in a covariate-based linkage analysis of the Collaborative Study on the Genetics of Alcoholism

    Get PDF
    BACKGROUND: Covariate-based linkage analyses using a conditional logistic model as implemented in LODPAL can increase the power to detect linkage by minimizing disease heterogeneity. However, each additional covariate analyzed will increase the degrees of freedom for the linkage test, and therefore can also increase the type I error rate. Use of a propensity score (PS) has been shown to improve consistently the statistical power to detect linkage in simulation studies. Defined as the conditional probability of being affected given the observed covariate data, the PS collapses multiple covariates into a single variable. This study evaluates the performance of the PS to detect linkage evidence in a genome-wide linkage analysis of microsatellite marker data from the Collaborative Study on the Genetics of Alcoholism. Analytical methods included nonparametric linkage analysis without covariates, with one covariate at a time including multiple PS definitions, and with multiple covariates simultaneously that corresponded to the PS definitions. Several definitions of the PS were calculated, each with increasing number of covariates up to a maximum of five. To account for the potential inflation in the type I error rates, permutation based p-values were calculated. RESULTS: Results suggest that the use of individual covariates may not necessarily increase the power to detect linkage. However the use of a PS can lead to an increase when compared to using all covariates simultaneously. Specifically, PS3, which combines age at interview, sex, and smoking status, resulted in the greatest number of significant markers identified. All methods consistently identified several chromosomal regions as significant, including loci on chromosome 2, 6, 7, and 12. CONCLUSION: These results suggest that the use of a propensity score can increase the power to detect linkage for a complex disease such as alcoholism, especially when multiple important covariates can be used to predict risk and thereby minimize linkage heterogeneity. However, because the PS is calculated as a conditional probability of being affected, it does require the presence of observed covariate data on both affected and unaffected individuals, which may not always be available in real data sets

    Developmental expression of tyrosyl kinase activity in human serum.

    Get PDF
    Tyrosine protein kinases, in addition to their roles as viral transforming proteins and growth factor receptors, have been suggested to have specialized functions in tissue specific processes and in differentiation. High levels of soluble tyrosine kinases have been found in human serum and plasma. To determine if the level of tyrosine kinase activity is development tally expressed in human serum, we assayed sera from 214 individuals of different ages from newborns to 90 years. We found that serum tyrosine kinase levels are high in newborns and the levels closely parallel skeletal growth until late adolescence. The serum tyrosine kinase levels increase again corresponding to the second and third decades and decline by the fourth decade of life. These studies show that tyrosine kinase levels are developmentally expressed in human serum and delineate the stages in post- natal development when changes in expression occur

    Haplotypic structure of the X chromosome in the COGA population sample and the quality of its reconstruction by extant software packages

    Get PDF
    BACKGROUND: The haplotypes of the X chromosome are accessible to direct count in males, whereas the diplotypes of the females may be inferred knowing the haplotype of their sons or fathers. Here, we investigated: 1) the possible large-scale haplotypic structure of the X chromosome in a Caucasian population sample, given the single-nucleotide polymorphism (SNP) maps and genotypes provided by Illumina and Affimetrix for Genetic Analysis Workshop 14, and, 2) the performances of widely used programs in reconstructing haplotypes from population genotypic data, given their known distribution in a sample of unrelated individuals. RESULTS: All possible unrelated mother-son pairs of Caucasian ancestry (N = 104) were selected from the 143 families of the Collaborative Study on the Genetics of Alcoholism pedigree files, and the diplotypes of the mothers were inferred from the X chromosomes of their sons. The marker set included 313 SNPs at an average density of 0.47 Mb. Linkage disequilibrium between pairs of markers was computed by the parameter D', whereas for measuring multilocus disequilibrium, we developed here an index called D*, and applied it to all possible sliding windows of 5 markers each. Results showed a complex pattern of haplotypic structure, with regions of low linkage disequilibrium separated by regions of high values of D*. The following programs were evaluated for their accuracy in inferring population haplotype frequencies: 1) ARLEQUIN 2.001; 2) PHASE 2.1.1; 3) SNPHAP 1.1; 4) HAPLOBLOCK 1.2; 5) HAPLOTYPER 1.0. Performances were evaluated by Pearson correlation (r) coefficient between the true and the inferred distribution of haplotype frequencies. CONCLUSION: The SNP haplotypic structure of the X chromosome is complex, with regions of high haplotype conservation interspersed among regions of higher haplotype diversity. All the tested programs were accurate (r = 1) in reconstructing the distribution of haplotype frequencies in case of high D* values. However, only the program PHASE realized a high correlation coefficient (r > 0.7) in conditions of low linkage disequilibrium

    A genome-wide association study identifies a susceptibility locus for biliary atresia on 2p16.1 within the gene EFEMP1

    Get PDF
    Biliary atresia (BA) is a rare pediatric cholangiopathy characterized by fibrosclerosing obliteration of the extrahepatic bile ducts, leading to cholestasis, fibrosis, cirrhosis, and eventual liver failure. The etiology of BA remains unknown, although environmental, inflammatory, infectious, and genetic risk factors have been proposed. We performed a genome-wide association study (GWAS) in a European-American cohort of 343 isolated BA patients and 1716 controls to identify genetic loci associated with BA. A second GWAS was performed in an independent European-American cohort of 156 patients with BA and other extrahepatic anomalies and 212 controls to confirm the identified candidate BA-associated SNPs. Meta-analysis revealed three genome-wide significant BA-associated SNPs on 2p16.1 (rs10865291, rs6761893, and rs727878; P < 5 ×10-8), located within the fifth intron of the EFEMP1 gene, which encodes a secreted extracellular protein implicated in extracellular matrix remodeling, cell proliferation, and organogenesis. RNA expression analysis showed an increase in EFEMP1 transcripts from human liver specimens isolated from patients with either BA or other cholestatic diseases when compared to normal control liver samples. Immunohistochemistry demonstrated that EFEMP1 is expressed in cholangiocytes and vascular smooth muscle cells in liver specimens from patients with BA and other cholestatic diseases, but it is absent from cholangiocytes in normal control liver samples. Efemp1 transcripts had higher expression in cholangiocytes and portal fibroblasts as compared with other cell types in normal rat liver. The identification of a novel BA-associated locus, and implication of EFEMP1 as a new BA candidate susceptibility gene, could provide new insights to understanding the mechanisms underlying this severe pediatric disorder
    • …
    corecore