236 research outputs found
Influence of control selection in genome-wide association studies: the example of diabetes in the Framingham Heart Study
Epidemiologic study designs represent a major challenge for genome-wide association studies. Most such studies to date have selected controls from the pool of participants without the disease of interest at the end of the study time. These choices can lead to biased estimates of exposure effects. Using data from the Framingham Heart Study (Genetic Analysis Workshop 16 Problem 2), we evaluate the impact on genetic association estimates for designs with control selection based on status at the end of a study (case exclusion (CE) sampling) to control selection based on incidence density (ID) sampling, when controls are selected from the pool of participants who are disease-free at the time a case is diagnosed. Cases are defined as those diagnosed with type 2 diabetes (T2D). We estimated odds ratios for 18 previously confirmed T2D variants using 189 cases selected by ID sampling and using 231 cases selected by CE sampling. We found none of these single-nucleotide polymorphisms to be significantly associated with T2D using either design. Because these empirical analyses were based on a small number of cases and on single-nucleotide polymorphisms with likely small effect sizes, we supplemented this work with simulated data sets of 500 cases from each strategies across a variety of allele frequencies and effect sizes. In our simulated datasets, we show ID sampling to be less biased than CE, although CE shows apparent increased power due to the upward bias of point estimates. We conclude that ID sampling is an appropriate option for genome-wide association studies
Use of susceptibility scoring in conjunction with the genotypic transmission disequilibrium test
We explored the utility of selecting a genetically predisposed subgroup to increase the finding of a genetic signal in the Genetic Analysis Workshop 14 Collaborative Study on the Genetics of Alcoholism dataset. A subgroup of affected probands with low environmental risk exposures was defined using a susceptibility score calculated from an environmental risk model. Thirty-nine probands with highly positive scores were selected, along with their parents, for use in a genotypic transmission disequilibrium test (TDT) test. We compared the results of the genotypic TDT in this subgroup to the TDT results using all probands and their parents. For some markers, the susceptibility scoring approach resulted in smaller p-values, while for other markers, evidence for a genetic signal weakened. Further explorations into genetic and environmental population characteristics that benefit from this approach are warranted
TRIO LOGIC REGRESSION - DETECTION OF SNP - SNP INTERACTIONS IN CASE-PARENT TRIOS
Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the linkage disequilibrium (LD) structure in the genotype data, and accommodates missing genotypes via haplotype-based imputation. We also derive an efficient algorithm to simulate case-parent trios where genetic risk is determined via epistatic interactions
Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms
Although permutation testing has been the gold standard for assessing significance levels in studies using multiple markers, it is time-consuming. A Bonferroni correction to the nominal p-value that uses the underlying pair-wise linkage disequilibrium (LD) structure among the markers to determine the number of effectively independent tests has recently been proposed. We propose using the number of independent LD blocks plus the number of independent single-nucleotide polymorphisms for correction. Using the Collaborative Study on the Genetics of Alcoholism LD data for chromosome 21, we simulated 1,000 replicates of parent-child trio data under the null hypothesis with two levels of LD: moderate and high. Assuming haplotype blocks were independent, we calculated the number of independent statistical tests using 3 haplotype blocking algorithms. We then compared the type I error rates using a principal components-based method, the three blocking methods, a traditional Bonferroni correction, and the unadjusted p-values obtained from FBAT. Under high LD conditions, the PC method and one of the blocking methods were slightly conservative, whereas the 2 other blocking methods exceeded the target type I error rate. Under conditions of moderate LD, we show that the blocking algorithm corrections are closest to the desired type I error, although still slightly conservative, with the principal components-based method being almost as conservative as the traditional Bonferroni correction
Use of longitudinal data in genetic studies in the genome-wide association studies era: summary of Group 14
Participants analyzed actual and simulated longitudinal data from the Framingham Heart Study for various metabolic and cardiovascular traits. The genetic information incorporated into these investigations ranged from selected single-nucleotide polymorphisms to genome-wide association arrays. Genotypes were incorporated using a broad range of methodological approaches including conditional logistic regression, linear mixed models, generalized estimating equations, linear growth curve estimation, growth modeling, growth mixture modeling, population attributable risk fraction based on survival functions under the proportional hazards models, and multivariate adaptive splines for the analysis of longitudinal data. The specific scientific questions addressed by these different approaches also varied, ranging from a more precise definition of the phenotype, bias reduction in control selection, estimation of effect sizes and genotype associated risk, to direct incorporation of genetic data into longitudinal modeling approaches and the exploration of population heterogeneity with regard to longitudinal trajectories. The group reached several overall conclusions: 1) The additional information provided by longitudinal data may be useful in genetic analyses. 2) The precision of the phenotype definition as well as control selection in nested designs may be improved, especially if traits demonstrate a trend over time or have strong age-of-onset effects. 3) Analyzing genetic data stratified for high-risk subgroups defined by a unique development over time could be useful for the detection of rare mutations in common multi-factorial diseases. 4) Estimation of the population impact of genomic risk variants could be more precise. The challenges and computational complexity demanded by genome-wide single-nucleotide polymorphism data were also discussed
“Gap hunting” to characterize clustered probe signals in Illumina methylation array data
Additional file 6: Figures S26–S31. All remaining SBE site scenarios. Each additional scenario of a SBE site-mapping SNP delimited in Fig. 4 not including the scenario shown in Fig. 5. Each of these figures contains 4 plots, showing every combination of CpG site interrogations on the forward and reverse strand as well as which nucleotide is the reference nucleotide
ASSOCIATON TESTS THAT ACCOMMODATE GENOTYPING ERRORS
High-throughput SNP arrays provide estimates of genotypes for up to one million loci, often used in genome-wide association studies. While these estimates are typically very accurate, genotyping errors do occur, which can influence in particular the most extreme test statistics and p-values. Estimates for the genotype uncertainties are also available, although typically ignored. In this manuscript, we develop a framework to incorporate these genotype uncertainties in case-control studies for any genetic model. We verify that using the assumption of a “local alternative” in the score test is very reasonable for effect sizes typically seen in SNP association studies, and show that the power of the score test is simply a function of the correlation of the genotype probabilities with the true genotypes. We demonstrate that the power to detect a true association can be substantially increased for difficult to call genotypes, resulting in improved inference in association studies
Parental exposures to occupational asthmagens and risk of autism spectrum disorder in a Danish population-based case-control study
Abstract Background Environmental exposures and immune conditions during pregnancy could influence development of autism spectrum disorder (ASD) in offspring. However, few studies have examined immune-triggering exposures in relation to ASD. We evaluated the association between parental workplace exposures to risk factors for asthma (“asthmagens”) and ASD. Methods We conducted a population-based case-control study in the Danish population using register linkage. Our study population consisted of 11,869 ASD cases and 48,046 controls born from 1993 through 2007. Cases were identified by ICD-10 codes in the Danish Psychiatric Central Register. ASD cases and controls were linked to parental Danish International Standard Classification of Occupations (DISCO-88) job codes. Parental occupational asthmagen exposure was estimated by linking DISCO-88 codes to an asthma-specific job-exposure matrix. Results Our maternal analyses included 6706 case mothers and 29,359 control mothers employed during the pregnancy period. We found a weak inverse association between ASD and any maternal occupational asthmagen exposure, adjusting for sociodemographic covariates (adjusted OR: 0.92, 95% CI: 0.86–0.99). In adjusted analyses, including 7647 cases and 31,947 controls with employed fathers, paternal occupational asthmagen exposure was not associated with ASD (adjusted OR: 0.98, 95% CI: 0.92–1.05). Conclusions We found a weak inverse association between maternal occupational asthmagen exposure and ASD, and a null association between paternal occupational exposure and ASD. We suggest that unmeasured confounding negatively biased the estimate, but that this unmeasured confounding is likely not strong enough to bring the effect above the null. Overall, our results were consistent with no positive association between parental asthmagen exposure and ASD in the children
Maternal Exposure to Occupational Asthmagens During Pregnancy and Autism Spectrum Disorder in the Study to Explore Early Development
Abstract Maternal immune activity has been linked to children with autism spectrum disorder (ASD). We examined maternal occupational exposure to asthma-causing agents during pregnancy in relation to ASD risk. Our sample included 463 ASD cases and 710 general population controls from the Study to Explore Early Development whose mothers reported at least one job during pregnancy. Asthmagen exposure was estimated from a published job-exposure matrix. The adjusted odds ratio for ASD comparing asthmagen-exposed to unexposed was 1.39 (95 % CI 0.96–2.02). Maternal workplace asthmagen exposure was not associated with ASD risk in this study, but this result does not exclude some involvement of maternal exposure to asthma-causing agents in ASD
- …