44 research outputs found
GWAS Meta-Analysis: Methodology and Application to Human Meiotic Recombination
Human meiotic recombination is critical to successful human reproduction and to maintain genetic diversity. Recombination anomalies are associated with aberrant meiotic outcomes with significant consequences. One important method for studying recombination is genome-wide association studies (GWAS) of recombination phenotypes. Because such studies require nuclear or three-generation family samples that have been genotyped on GWAS chips, the number of suitable datasets is limited. The goal of this dissertation is to develop methods for increasing the available sample sizes for GWAS of recombination phenotypes.
We developed two different approaches for increasing sample size. First, we made it possible to include additional family types in the analysis. We developed methods for scoring recombination for half-sibling pedigrees and three generation pedigrees with ungenotyped individuals. Second, we developed a regionally smoothed meta-analysis method for GWAS data, which will allow the combination datasets that have been genotyped on different chips. This method will help increase available sample sizes for recombination studies, but is also applicable to all GWAS studies.
The public health significance of this work is that our developments will allow us to find new genes that control recombination and more information about already-known genes. This information can be used for improved treatment and prevention of the consequences of aberrant recombination, including infertility and births with significant chromosomal anomalies
Recommended from our members
Hemizygous Deletion on Chromosome 3p26.1 Is Associated with Heavy Smoking among African American Subjects in the COPDGene Study
Many well-powered genome-wide association studies have identified genetic determinants of self-reported smoking behaviors and measures of nicotine dependence, but most have not considered the role of structural variants, such as copy number variation (CNVs), influencing these phenotypes. Here, we included 2,889 African American and 6,187 non-Hispanic White subjects from the COPDGene cohort (http://www.copdgene.org) to carefully investigate the role of polymorphic CNVs across the genome on various measures of smoking behavior. We identified a CNV component (a hemizygous deletion) on chromosome 3p26.1 associated with two quantitative phenotypes related to smoking behavior among African Americans. This polymorphic hemizygous deletion is significantly associated with pack-years and cigarettes smoked per day among African American subjects in the COPDGene study. We sought evidence of replication in African Americans from the population based Atherosclerosis Risk in Communities (ARIC) study. While we observed similar CNV counts, the extent of exposure to cigarette smoking among ARIC subjects was quite different and the smaller sample size of heavy smokers in ARIC severely limited statistical power, so we were unable to replicate our findings from the COPDGene cohort. But meta-analyses of COPDGene and ARIC study subjects strengthened our association signal. However, a few linkage studies have reported suggestive linkage to the 3p26.1 region, and a few genome-wide association studies (GWAS) have reported markers in the gene (GRM7) nearest to this 3p26.1 area of polymorphic deletions are associated with measures of nicotine dependence among subjects of European ancestry
Genome-wide association Scan of dental caries in the permanent dentition
Background: Over 90% of adults aged 20 years or older with permanent teeth have suffered from dental caries leading to pain, infection, or even tooth loss. Although caries prevalence has decreased over the past decade, there are still about 23% of dentate adults who have untreated carious lesions in the US. Dental caries is a complex disorder affected by both individual susceptibility and environmental factors. Approximately 35-55% of caries phenotypic variation in the permanent dentition is attributable to genes, though few specific caries genes have been identified. Therefore, we conducted the first genome-wide association study (GWAS) to identify genes affecting susceptibility to caries in adults. Methods: Five independent cohorts were included in this study, totaling more than 7000 participants. For each participant, dental caries was assessed and genetic markers (single nucleotide polymorphisms, SNPs) were genotyped or imputed across the entire genome. Due to the heterogeneity among the five cohorts regarding age, genotyping platform, quality of dental caries assessment, and study design, we first conducted genome-wide association (GWA) analyses on each of the five independent cohorts separately. We then performed three meta-analyses to combine results for: (i) the comparatively younger, Appalachian cohorts (N = 1483) with well-assessed caries phenotype, (ii) the comparatively older, non-Appalachian cohorts (N = 5960) with inferior caries phenotypes, and (iii) all five cohorts (N = 7443). Top ranking genetic loci within and across meta-analyses were scrutinized for biologically plausible roles on caries. Results: Different sets of genes were nominated across the three meta-analyses, especially between the younger and older age cohorts. In general, we identified several suggestive loci (P-value ≤ 10E-05) within or near genes with plausible biological roles for dental caries, including RPS6KA2 and PTK2B, involved in p38-depenedent MAPK signaling, and RHOU and FZD1, involved in the Wnt signaling cascade. Both of these pathways have been implicated in dental caries. ADMTS3 and ISL1 are involved in tooth development, and TLR2 is involved in immune response to oral pathogens. Conclusions: As the first GWAS for dental caries in adults, this study nominated several novel caries genes for future study, which may lead to better understanding of cariogenesis, and ultimately, to improved disease predictions, prevention, and/or treatment
Comprehensive literature review and statistical considerations for GWAS meta-analysis
Over the last decade, genome-wide association studies (GWAS) have become the standard tool for gene discovery in human disease research. While debate continues about how to get the most out of these studies and on occasion about how much value these studies really provide, it is clear that many of the strongest results have come from large-scale mega-consortia and/or meta-analyses that combine data from up to dozens of studies and tens of thousands of subjects. While such analyses are becoming more and more common, statistical methods have lagged somewhat behind. There are good meta-analysis methods available, but even when they are carefully and optimally applied there remain some unresolved statistical issues. This article systematically reviews the GWAS meta-analysis literature, highlighting methodology and software options and reviewing methods that have been used in real studies. We illustrate differences among methods using a case study. We also discuss some of the unresolved issues and potential future directions
Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD
Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p 10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10−392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group
Gene expression analysis of microarray data : a case study of papilllary thyroid carcinoma data
Microarray technology allows researchers to monitor the mRNA transcription levels of thousands of genes in parallel which opens the door for more advanced cancer research. This thesis focuses on a case study of papillary thyroid carcinoma data. Fourteen publicly available Affymetrix microarray data sets were used where seven samples were collected from normal thyroid tissue and the remaining seven were collected from papillary thyroid carcinoma tissue. The present study compared the results obtained from three different normalization processes: MAS5.0, RMA and GCRMA in detecting differentially expressed genes under two conditions. Internal consistencies within the methods as well as the results across three methods were compared. Statistical packages 82.5.1 and Bioconductor 2.08 are used to perform the data analysis. Each step of normalization with MAS5.0 and RMA is described. Statistical package Limma is used to fit a linear model. Finally an empirical Bayes method is used to detect the significantly differentially expressed genes. First, considering all genes a comparison is made among the three normalization methods where RMA and GCRMA showed the maximum agreement in detecting differentially expressed genes. Then using unspecified filtering process a set of genes was selected and the whole process was replicated where the top fifty differentially expressed genes did not show any overlap with each other.Department of Mathematical SciencesThesis (M.A.