156 research outputs found

    Power and type I error rate of false discovery rate approaches in genome-wide association studies

    Get PDF
    In genome-wide genetic studies with a large number of markers, balancing the type I error rate and power is a challenging issue. Recently proposed false discovery rate (FDR) approaches are promising solutions to this problem. Using the 100 simulated datasets of a genome-wide marker map spaced about 3 cM and phenotypes from the Genetic Analysis Workshop 14, we studied the type I error rate and power of Storey's FDR approach, and compared it to the traditional Bonferroni procedure. We confirmed that Storey's FDR approach had a strong control of FDR. We found that Storey's FDR approach only provided weak control of family-wise error rate (FWER). For these simulated datasets, Storey's FDR approach only had slightly higher power than the Bonferroni procedure. In conclusion, Storey's FDR approach is more powerful than the Bonferroni procedure if strong control of FDR or weak control of FWER is desired. Storey's FDR approach has little power advantage over the Bonferroni procedure if there is low linkage disequilibrium among the markers. Further evaluation of the type I error rate and power of the FDR approaches for higher linkage disequilibrium and for haplotype analyses is warranted

    Disparities in allele frequencies and population differentiation for 101 disease-associated single nucleotide polymorphisms between Puerto Ricans and Non-Hispanic Whites

    Get PDF
    BACKGROUND. Variations in gene allele frequencies can contribute to differences in the prevalence of some common complex diseases among populations. Natural selection modulates the balance in allele frequencies across populations. Population differentiation (FST) can evidence environmental selection pressures. Such genetic information is limited in Puerto Ricans, the second largest Hispanic ethnic group in the US, and a group with high prevalence of chronic disease. We determined allele frequencies and population differentiation for 101 single nucleotide polymorphisms (SNPs) in 30 genes involved in major metabolic and disease-relevant pathways in Puerto Ricans (n = 969, ages 45–75 years) and compared them to similarly aged non-Hispanic whites (NHW) (n = 597). RESULTS. Minor allele frequency (MAF) distributions for 45.5% of the SNPs assessed in Puerto Ricans were significantly different from those of NHW. Puerto Ricans carried risk alleles in higher frequency and protective alleles in lower frequency than NHW. Patterns of population differentiation showed that Puerto Ricans had SNPs with exceptional FST values in intronic, non-synonymous and promoter regions. NHW had exceptional FST values in intronic and promoter region SNPs only. CONCLUSION. These observations may serve to explain and broaden studies on the impact of gene polymorphisms on chronic diseases affecting Puerto Ricans.National Institutes of Health, National Institutes on Aging (P01AG02394, P01AG023394-SI); National Insitutes of Health (53-K06-5-10); US Department of Agriculture Research Service (58-1950-9-001, 58-1950-7-707); National Institutes of Health & Heart, Lung, and Blood Institute (U 01 HL72524, Genetic and Environmental Determinants of Triglycerides, HL54776

    Genetic analyses of longitudinal phenotype data: a comparison of univariate methods and a multivariate approach

    Get PDF
    BACKGROUND: We explored three approaches to heritability and linkage analyses of longitudinal total cholesterol levels (CHOL) in the Genetic Analysis Workshop 13 simulated data without knowing the answers. The first two were univariate approaches and used 1) baseline measure at exam one or 2) summary measures such as mean and slope from multiple exams. The third method was a multivariate approach that directly models multiple measurements on a subject. A variance components model (SOLAR) was employed in the univariate approaches. A mixed regression model with polynomials was employed in the multivariate approach and implemented in SAS/IML. RESULTS: Using the baseline measure at exam 1, we detected all baseline or slope genes contributing a substantial amount (0.08) of variance (LOD > 3). Compared to the baseline measure, the mean measures yielded slightly higher LOD at the slope genes, and a lower LOD at the baseline genes. The slope measure produced a somewhat lower LOD for the slope gene than did the mean measure. Descriptive information on the pattern of changes in gene effects with age was estimated for three linked loci by the third approach. CONCLUSION: We found simple univariate methods may be effective to detect genes affecting longitudinal phenotypes but may not fully reveal temporal trends in gene effects. The relative efficiency of the univariate methods to detect genes depends heavily on the underlying model. Compared with the univariate approaches, the multivariate approach provided more information on temporal trends in gene effects at the cost of more complicated modelling and more intense computations

    Large meta-analysis of genome-wide association studies identifies five loci for lean body mass

    Get PDF
    Lean body mass, consisting mostly of skeletal muscle, is important for healthy aging. We performed a genome-wide association study for whole body (20 cohorts of European ancestry with n = 38,292) and appendicular (arms and legs) lean body mass (n = 28,330) measured using dual energy X-ray absorptiometry or bioelectrical impedance analysis, adjusted for sex, age, height, and fat mass. Twenty-one single-nucleotide polymorphisms were significantly associated with lean body mass either genome wide (p < 5 × 10−8) or suggestively genome wide (p < 2.3 × 10−6). Replication in 63,475 (47,227 of European ancestry) individuals from 33 cohorts for whole body lean body mass and in 45,090 (42,360 of European ancestry) subjects from 25 cohorts for appendicular lean body mass was successful for five single-nucleotide polymorphisms in/near HSD17B11, VCAN, ADAMTSL3, IRS1, and FTO for total lean body mass and for three single-nucleotide polymorphisms in/near VCAN, ADAMTSL3, and IRS1 for appendicular lean body mass. Our findings provide new insight into the genetics of lean body mass

    Dietary Intake of n-6 Fatty Acids Modulates Effect of Apolipoprotein A5 Gene on Plasma Fasting Triglycerides, Remnant Lipoprotein Concentrations, and Lipoprotein Particle Size

    Get PDF
    Background— Apolipoprotein A5 gene (APOA5) variation is associated with plasma triglycerides (TGs). However, little is known about whether dietary fat modulates this association. Methods and Results— We investigated the interaction between APOA5 gene variation and dietary fat in determining plasma fasting TGs, remnant-like particle (RLP) concentrations, and lipoprotein particle size in 1001 men and 1147 women who were Framingham Heart Study participants. Polymorphisms –1131T>C and 56C>G, representing 2 independent haplotypes, were analyzed. Significant gene–diet interactions between the –1131T>C polymorphism and polyunsaturated fatty acid (PUFA) intake were found (PG polymorphism. The –1131C allele was associated with higher fasting TGs and RLP concentrations (P6% of total energy). No heterogeneity by sex was found. These interactions showed a dose-response effect when PUFA intake was considered as a continuous variable (P<0.01). Similar interactions were found for the sizes of VLDL and LDL particles. Only in carriers of the –1131C allele did the size of these particles increase (VLDL) or decrease (LDL) as PUFA intake increased (P<0.01). We further analyzed the effects of n-6 and n-3 fatty acids and found that the PUFA–APOA5 interactions were specific for dietary n-6 fatty acids. Conclusions— Higher n-6 (but not n-3) PUFA intake increased fasting TGs, RLP concentrations, and VLDL size and decreased LDL size in APOA5 –1131C carriers, suggesting that n-6 PUFA–rich diets are related to a more atherogenic lipid profile in these subjects.Corella Piquer, Maria Dolores, [email protected]

    Data abstractions for decision tree induction

    Get PDF
    AbstractWhen descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered knowledge tends to become complicated. A notion of data abstraction seems useful to resolve this kind of problems, as we obtain a smaller and more general database after the abstraction, from which we can quickly extract more abstract knowledge that is expected to be easier to understand. In general, however, since there exist several possible abstractions, we have to carefully select one according to which the original database is generalized. An inadequate selection would make the accuracy of extracted knowledge worse.From this point of view, we propose in this paper a method of selecting an appropriate abstraction from possible ones, assuming that our task is to construct a decision tree from a relational database. Suppose that, for each attribute in a relational database, we have a class of possible abstractions for the attribute values. As an appropriate abstraction for each attribute, we prefer an abstraction such that, even after the abstraction, the distribution of target classes necessary to perform our classification task can be preserved within an acceptable error range given by user.By the selected abstractions, the original database can be transformed into a small generalized database written in abstract values. Therefore, it would be expected that, from the generalized database, we can construct a decision tree whose size is much smaller than one constructed from the original database. Furthermore, such a size reduction can be justified under some theoretical assumptions. The appropriateness of abstraction is precisely defined in terms of the standard information theory. Therefore, we call our abstraction framework Information Theoretical Abstraction.We show some experimental results obtained by a system ITA that is an implementation of our abstraction method. From those results, it is verified that our method is very effective in reducing the size of detected decision tree without making classification errors so worse

    A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Blood lipid levels including low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) are highly heritable. Genome-wide association is a promising approach to map genetic loci related to these heritable phenotypes.</p> <p>Methods</p> <p>In 1087 Framingham Heart Study Offspring cohort participants (mean age 47 years, 52% women), we conducted genome-wide analyses (Affymetrix 100K GeneChip) for fasting blood lipid traits. Total cholesterol, HDL-C, and TG were measured by standard enzymatic methods and LDL-C was calculated using the Friedewald formula. The long-term averages of up to seven measurements of LDL-C, HDL-C, and TG over a ~30 year span were the primary phenotypes. We used generalized estimating equations (GEE), family-based association tests (FBAT) and variance components linkage to investigate the relationships between SNPs (on autosomes, with minor allele frequency ≥10%, genotypic call rate ≥80%, and Hardy-Weinberg equilibrium p ≥ 0.001) and multivariable-adjusted residuals. We pursued a three-stage replication strategy of the GEE association results with 287 SNPs (P < 0.001 in Stage I) tested in Stage II (n ~1450 individuals) and 40 SNPs (P < 0.001 in joint analysis of Stages I and II) tested in Stage III (n~6650 individuals).</p> <p>Results</p> <p>Long-term averages of LDL-C, HDL-C, and TG were highly heritable (h<sup>2 </sup>= 0.66, 0.69, 0.58, respectively; each P < 0.0001). Of 70,987 tests for each of the phenotypes, two SNPs had p < 10<sup>-5 </sup>in GEE results for LDL-C, four for HDL-C, and one for TG. For each multivariable-adjusted phenotype, the number of SNPs with association p < 10<sup>-4 </sup>ranged from 13 to 18 and with p < 10<sup>-3</sup>, from 94 to 149. Some results confirmed previously reported associations with candidate genes including variation in the lipoprotein lipase gene (<it>LPL</it>) and HDL-C and TG (rs7007797; P = 0.0005 for HDL-C and 0.002 for TG). The full set of GEE, FBAT and linkage results are posted at the <b>d</b>ata<b>b</b>ase of <b>G</b>enotype <b>a</b>nd <b>P</b>henotype (dbGaP). After three stages of replication, there was no convincing statistical evidence for association (i.e., combined P < 10<sup>-5 </sup>across all three stages) between any of the tested SNPs and lipid phenotypes.</p> <p>Conclusion</p> <p>Using a 100K genome-wide scan, we have generated a set of putative associations for common sequence variants and lipid phenotypes. Validation of selected hypotheses in additional samples did not identify any new loci underlying variability in blood lipids. Lack of replication may be due to inadequate statistical power to detect modest quantitative trait locus effects (i.e., <1% of trait variance explained) or reduced genomic coverage of the 100K array. GWAS in FHS using a denser genome-wide genotyping platform and a better-powered replication strategy may identify novel loci underlying blood lipids.</p
    • …
    corecore