Skip to main content
Article thumbnail
Location of Repository

The application and development of methods to combine and infer information from genetic epidemiological studies of cardiovascular and other complex traits

By Nicholas G.D. Masca

Abstract

This thesis investigates methods to combine and infer information from genetic\ud epidemiological studies. Three issues are explored, each in a distinct and self-contained\ud chapter.\ud Chapter 1 investigates how best to incorporate treatment information in genetic\ud analyses of blood pressure. Different approaches to adjusting for treatment are\ud compared in a number of simulated scenarios, and the approaches that utilise\ud all the observed data are generally shown to perform best. One particular\ud condition, however, causes these approaches to suffer bias. This is where a\ud genetic variant (or some other factor) interacts with treatment. This chapter\ud therefore urges caution in the interpretation of results from these studies, and\ud suggests some possible approaches to identifying existing interactions with\ud treatment.\ud Chapter 2 concerns participant privacy in genome-wide association studies\ud (GWAS). Recent methods claim to be able to infer whether an individual\ud participated in a study, using only aggregate statistics from the study such as\ud allele frequencies. In the past, these statistics have been freely published\ud online. This chapter explores the full implications of these methods, by\ud investigating their true capabilities and limitations. In addition, some\ud modifications are proposed to one particular method, to demonstrate how it can\ud be adapted for use in practice. This work finds that participant identification is\ud possible in ideal conditions, but common characteristics of real studies may\ud prevent any reliable application of these methods in practice.\ud Chapter 3 proposes a new approach to synthesising data between studies.\ud This approach – named “DataSHIELD” – guarantees identical results to an\ud individual-level meta-analysis, while offering greater flexibility than the studylevel\ud meta-analysis. DataSHIELD also potentially circumvents some of the laws\ud that restrict data use, because it does not involve sharing any individual-level\ud data between studies. This chapter outlines the principles underpinning\ud DataSHIELD, and demonstrates its use in a simulated data example

Publisher: University of Leicester
Year: 2011
OAI identifier: oai:lra.le.ac.uk:2381/10053

Suggested articles

Citations

  1. (2009). 36 Infectious Diseases Society of America. Grinding to a halt: the effects of the increasing regulatory burden on research and quality improvement efforts. Clin Infectious Dis
  2. (2009). 44 ESRC_Secure_Data_Service. http://www.esrc.ac.uk/ESRC InfoCentre/research/resources/SDS.aspx.
  3. (2007). A Bayesian hierarchical mixture model for genotype calling in a multi-cohort study, (in preparation).
  4. (2001). A Dictionary of Epidemiology.
  5. A genomewide association study of type 2 diabetes in Finns detects multiple susceptibility variants.
  6. A HapMap harvest of insights into the genetics of common disease.
  7. (2008). A Language and Environment for Statistical Computing.
  8. A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies.
  9. (2009). A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies."
  10. A Note on Robust Variance Estimation for Cluster-Correlated Data.
  11. (2000). A Note on Robust Variance Estimation for Cluster-Correlated Data."
  12. (2005). A summary of the effects of anti hypertensive medications on measured blood pressure.
  13. A systematic review? Milbank Quart 2004;82:5–99. 21
  14. (2005). Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Statistics in Medicine
  15. (2001). Aldosterone synthase gene (CYP11B2) C-334T polymorphism, ambulatory blood pressure and nocturnal decline in blood pressure in the general Japanese population: the Ohasama Study.
  16. (2003). An epidemiological study of blood pressure and metabolic phenotypes in relation to the Gbeta3 C825T polymorphism.
  17. (2003). An epidemiological study of blood pressure and metabolic phenotypes in relation to the Gbeta3 C825T polymorphism."
  18. (1997). An imputation method for non-ignorable missing data in studies of blood pressure. Statistics in Medicine
  19. (1997). An imputation method for non-ignorable missing data in studies of blood pressure."
  20. (2002). An introduction to generalized linear models.
  21. An overview of methods for the analysis of longitudinal data. Statistics in Medicine.
  22. (2003). and The International HapMap Consortium.
  23. (2007). and The Wellcome Trust Case Ccontrol Consortium
  24. (2001). Antihypertensive pharmacogenetics: getting the right drug into the right patient.
  25. (2010). Antihypertensive treatments obscure familial contributions to blood pressure variation. Hypertension
  26. (2003). Antihypertensive treatments obscure familial contributions to blood pressure variation."
  27. (1993). Approximate inference in generalized linear mixed models.
  28. Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants.
  29. (2010). Biobanks and biobank harmonisation. An Introduction to Genetic Epidemiology.
  30. (1979). Blood pressure aggregation in families."
  31. (2001). Case-control studies of association in structured or admixed populations. Theoretical Population Biology.
  32. (2001). Case-control studies of association in structured or admixed populations."
  33. (1996). Children of the Nineties: a resource for assessing the magnitude of long-term effects of prenatal, perinatal and subsequent events."
  34. (1958). Cohort profile:
  35. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder."
  36. Common variants at ten loci influence QT interval duration in the QTGEN Study.
  37. (2007). Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet
  38. (1998). Comparison of Meta-Analysis Versus Analysis of Variance of Individual Patient Data."
  39. Complex mixtures: a critical examination of a paper by Homer et al. Forensic Science International: Genetics. 2011;In press doi:10.1016/j.fsigen.2011.02.003. 10. Clayton D. On inferring presence of an individual in a mixture: a Bayesian approach.
  40. (2010). Complex mixtures: a critical examination of a paper by Homer et al." Forensic Science International: Genetics
  41. Consent and population genomics: the creation of generic tools.
  42. (2009). Consent and population genomics: the creation of generic tools."
  43. (1997). coping, lifestyle and hypertension: a paradigm for research, prevention and non-pharmacological management of hypertension. Clinical and Experimental Hypertension
  44. Data dissemination and disclosure limitation in world without microdata: a risk-utility framework for remote access analysis servers.
  45. (2009). Data publication: towards a database of everything."
  46. (2009). Data sharing in genomics [mdash] re-shaping scientific practice."
  47. Data sharing in genomics [mdash] reshaping scientific practice.
  48. (2000). Decision Analysis, and CostEffectiveness Analysis: Methods for Quantitative Synthesis in Medicine. 2nd edn.
  49. (2007). Designing candidate gene and genome-wide case-control association studies. Nat Protocols
  50. (2009). Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet
  51. Do we need a uniform regulatory system for biobanks across Europe?
  52. (2005). Do we need a uniform regulatory system for biobanks across Europe?"
  53. Eight blood pressure loci identified by genomewide association study of 34,433 people of European ancestry.
  54. Epidemiol 2007;36:590–96. 22 Sankararaman
  55. Estimating equations for association structures. Statistics in Medicine.
  56. (2004). Estimating equations for association structures."
  57. (2008). Estimation of risk factor associations when the response is influenced by medication use: an imputation approach.
  58. (2008). Estimation of risk factor associations when the response is influenced by medication use: An imputation approach."
  59. Ethical issues and GenomEUtwin. Twin Res 2003;6:455–63. 20 Lynch
  60. (2000). Evidence for a gene influencing blood
  61. (2000). Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham heart study. Hypertension
  62. (1968). Evolution and the Genetics of Populations,
  63. (1968). Evolution and the Genetics of Populations:
  64. (2002). Experimental design and data analysis for biologists. Cambridge,
  65. (1998). Extending the simple linear regression model to account for correlated responses: an introduction to generalized estimating equations and multi-level mixed modelling."
  66. (2008). For the majority of patients with diabetes blood
  67. (2008). From genetic privacy to open consent."
  68. (2000). Gene-environment interaction and twin studies. London,
  69. (1991). Generalized linear models,
  70. (1989). Generalized Linear Models.
  71. (2002). Genes and family environment explain correlations between blood pressure and body mass index."
  72. (2008). GENETIC PRIVACY: Whole-Genome Data Not Anonymous, Challenging Assumptions."
  73. (2002). Genome scans for blood pressure and hypertension - The National Heart, Lung, and Blood Institute Family Heart Study."
  74. (2002). Genome scans for blood pressure and hypertension—The National Heart, Lung, and Blood Institute Family Heart Study. Hypertension
  75. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels.
  76. (2008). Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics
  77. (2008). Genome-wide association studies for complex traits: consensus, uncertainty and challenges."
  78. (2009). Genome-wide association study identifies eight loci associated with blood pressure. Nature Genetics
  79. (2009). Genome-wide association study identifies eight loci associated with blood pressure."
  80. Genome-wide association study identifies five loci associated with lung function.
  81. Genome-wide association study identifies novel breast cancer susceptibility loci.
  82. (2010). Genome-wide association study metaanalysis identifies seven new rheumatoid arthritis risk loci."
  83. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.
  84. (2009). Genome-wide association study of blood pressure and hypertension. Nature Genetics
  85. (2007). Genomewide association analysis of coronary artery disease. New England Journal of Medicine.
  86. (2008). Genomewide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics.
  87. Genomic privacy and limits of individual detection in a pool.
  88. Genomic privacy and limits of individual detection in a pool. Nature Genetics.
  89. (2009). Genomic privacy and limits of individual detection in a pool."
  90. (2009). Genotype-phenotype databases: challenges and solutions for the post-genomic era." Bibliography Venables,
  91. (2005). Global burden of hypertension: analysis of worldwide data."
  92. (2009). Grinding to a Halt: The Effects of the Increasing Regulatory Burden on Research and Quality Improvement Efforts."
  93. (2009). Has pharmacogenetics brought us closer to 'personalized medicine' for initial drug treatment of hypertension?"
  94. (2009). Has pharmacogenetics brought us closer to ‘personalized medicine’ for initial drug treatment of hypertension? Current Opinion in Cardiology
  95. (2008). Health Beats MoD on Equipment Losses,
  96. (2007). Heterogeneity in meta-analyses of genome-wide association investigations."
  97. Hilbe (2007). Generalized Estimating Equations,
  98. (2009). http://securedata.ukda.ac.uk/."
  99. (2003). Hypertension prevalence and blood pressure levels in 6 European countries, Canada, and the United States.
  100. (2003). Hypertension Prevalence and Blood Pressure Levels in 6 European Countries, Canada, and the United States."
  101. Identifiability in genomic research.
  102. (2007). Identifiability in genomic research."
  103. (2009). Identifying Individuals in a Complex Mixture of DNA with Unknown Ancestry. Statistical Applications in Genetics and Molecular Biology.
  104. (2009). Identifying Individuals in a Complex Mixture of DNA with Unknown Ancestry."
  105. (2006). Imputation strategies for blood pressure data nonignorably missing due to medication use. Clinical Trials
  106. (2006). Imputation strategies for blood pressure data nonignorably missing due to medication use."
  107. (1987). Interpretation of results from subset analyses within overviews of randomized clinical trials."
  108. (2009). Introduction to meta-analysis.
  109. Investigation of the fine structure of European populations with applications to disease association studies.
  110. (2008). Investigation of the fine structure of European populations with applications to disease association studies."
  111. (2004). Least angle regression." Annals of statistics:
  112. Likelihood-based inference for genetic correlation coefficients. Theoretical Population Biology.
  113. (2003). Likelihood-based inference for genetic correlation coefficients."
  114. Linear mixture analysis: a mathematical approach to resolving mixed DNA samples.
  115. (1986). Longitudinal Data Analysis for Discrete and Continuous Outcomes."
  116. Longitudinal data analysis using generalized linear models.
  117. (1986). Longitudinal data analysis using generalized linear models."
  118. (2007). Maternal influence on blood pressure suggests involvement of mitochondrial DNA in the pathogenesis of hypertension: the Framingham Heart Study."
  119. Maximum likelihood estimation of misspecified models.
  120. (1982). Maximum likelihood estimation of misspecified models."
  121. (2003). Measuring inconsistency in metaanalyses."
  122. (1994). Median analysis of blood pressure for a sample including treated hypertensives."
  123. (2008). Meta analysis of individual and aggregate level data."
  124. (2001). Meta analysis of ordinal outcomes using individual patient data." Statistics in Medicine
  125. (1986). Meta-analysis in clinical trials."
  126. (2001). Meta-analysis of continuous outcome data from individual patients." Statistics in Medicine
  127. (2008). Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes."
  128. Meta-analysis of individual- and aggregate-level data.
  129. (2000). Methods for meta-analysis in medical research,
  130. (1993). Methods for pooled analyses of epidemiologic studies.
  131. (1988). Models for longitudinal data: a generalized estimating equation approach."
  132. (2001). Molecular mechanisms of human hypertension. Cell
  133. (2001). Molecular Mechanisms of Human Hypertension."
  134. (1997). Mortality by cause for eight regions of the world: global burden of disease study. The Lancet
  135. (1997). Mortality by cause for eight regions of the world: Global Burden of Disease Study."
  136. (1987). Multiple Imputation for Nonresponse in Surveys.
  137. (1999). Multiple imputation of missing blood pressure covariates in survival analysis."
  138. (2009). Needles in the haystack: identifying individuals present in pooled genomic data. PLos Genetics.
  139. (2009). Needles in the haystack: identifying individuals present in pooled genomic data." PLos
  140. (1961). Observer factors in the measurement of blood pressure. Nursing Research
  141. (1961). Observer Factors in the Measurement of Blood Pressure."
  142. of Genomics and Policy, Faculty of Medicine, Department of Human Genetics,
  143. (2010). On inferring presence of an individual in a mixture: a Bayesian approach."
  144. (2003). on prevention, detection, evaluation, and treatment of high blood pressure. Hypertension
  145. On the synthesis and interpretation of consistent but weak gene-disease associations in the era of genome-wide association studies.
  146. (2007). Outpatient hypertension treatment, treatment intensification, and control
  147. (2007). Outpatient Hypertension Treatment, Treatment Intensification, and Control in Western Europe and the United States."
  148. (2005). Pharmacogenetic Association of the Angiotensin-Converting Enzyme Insertion/Deletion Polymorphism on Blood Pressure and Cardiovascular Risk in Relation to Antihypertensive Treatment: The Genetics of Hypertension-Associated Treatment (GenHAT) Study."
  149. Population stratification and spurious allelic association. The Lancet.
  150. (2003). Population stratification and spurious allelic association."
  151. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.
  152. Potential for Revealing Individual-Level Information in Genome-wide Association Studies.
  153. (2002). Prospective studies collaboration. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. The Lancet
  154. (2002). Prospective Studies Collaboration. Agespecific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies."
  155. (2009). Public access to genome-wide data: five views on balancing research with privacy and protection. PLoS Genet
  156. (2009). Public Access to Genome-Wide Data: Five Views on Balancing Research with Privacy and Protection."
  157. (2000). QTL influencing blood pressure maps to the region of PPH1 on chromosome 2q31-34 in Old Order Amish."
  158. (2010). Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies."
  159. (2008). quantreg: Quantile Regression, R package version 4.17,
  160. (2008). quantreg: Quantile Regression." R package version 4.17. http://www.r-project.org
  161. (2008). Racial differences of lipoprotein subclass distributions in postmenopausal women."
  162. (1986). Recommendations on blood pressure measurement.
  163. (1986). Recommendations on blood pressure measurement."
  164. (1988). Repeated measurement analysis for nonnormal data in small samples."
  165. Replication of genome-wide association signals in U.K. Samples reveals risk loci for type 2 diabetes.
  166. (2008). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet
  167. (2008). Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet.
  168. (2008). Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays."
  169. (2007). Secure, privacy-preserving analysis of distributed databases."
  170. (2003). Seventh report of the Joint National Committee on prevention, detection, evaluation, and treatment of high blood pressure."
  171. (2007). Share and share alike: deciding how to distribute the scientific and social benefits of genomic data."
  172. Size matters: just how big is BIG?: Quantifying realistic sample size requirements for human genome epidemiology.
  173. (2009). Size matters: just how big is BIG?: Quantifying realistic sample size requirements for human genome epidemiology."
  174. (1985). Statistical methods for meta-analysis,
  175. (1989). Statistical Modelling in GLIM. Oxford:
  176. (1993). Statistical models in epidemiology,
  177. (1999). Statistics in Medicine
  178. (2009). STrengthening the REporting of
  179. (1997). Stress, coping, lifestyle and hypertension: a paradigm for research, prevention and non-pharmacological management of hypertension." Clinical and Experimental Hypertension
  180. (1999). Structure of the standardized computerized 24-h diet recall interview used as reference method in the 22 centers participating in the EPIC project. Comp Meth Programs Biomed
  181. The case for a global human genome epidemiology initiative.
  182. The case for a US prospective cohort study of genes and environment.
  183. (2006). The cost of dichotomising continuous variables."
  184. (1966). The design of experiments.
  185. (2003). The detection of gene–environment interaction for continuous traits: should we deal with measurement error by bigger studies or better measurement?
  186. (2007). The effect of omitted covariates on confidence interval and study power in binary outcome analysis: A simulation study."
  187. (1997). The effects of environmental and lifestyle factors on blood pressure and the intermediary role of the sympathetic nervous system.
  188. (1997). The effects of environmental and lifestyle factors on blood pressure and the intermediary role of the sympathetic nervous system."
  189. The effects of human population structure on large genetic association studies.
  190. (2004). The effects of human population structure on large genetic association studies."
  191. (1994). The history and geography of human genes.
  192. (2003). The international HapMap project."
  193. The limits of individual identification from sample allele frequencies: theory and statistical analysis.
  194. (2009). The limits of individual identification from sample allele frequencies: theory and statistical analysis."
  195. (1999). The minimum sum of absolute errors regression: a robust alternative to the least squares regression."
  196. The public population project in genomics (P3G): a proof of concept?
  197. (2006). The R package geepack for generalized estimating equations."
  198. The SAIL Databank: building a national architecture for e-health research and evaluation.
  199. (2007). The Uneasy Ethical and Legal Underpinnings of Large-Scale Genomic Biobanks."
  200. (2003). The use of regression models for medians when observed outcomes may be modified by interventions. Statistics in Medicine
  201. (2003). The use of regression models for medians when observed outcomes may be modified by interventions."
  202. (2010). TOBIN Parametric Adjustment (i), and Censored Normal Regression (j)] and two of the Naïve approaches [No Adjustment (a) and Treatment as a Binary Covariate (b)], there is generally less bias in this scenario 1.
  203. (2003). Treatment of hypertension in adults with diabetes. Clinical Diabetes
  204. (2003). Treatment of Hypertension in Adults With Diabetes."
  205. (2004). Using patient-identifiable data for epidemiological research. Transf Med
  206. (2004). Using patient-identifiable data for epidemiological research."
  207. (2003). Value of low dose combination treatment with blood pressure lowering drugs: analysis of 354 randomised trials."
  208. (2007). Variants of transcription factor 7-like 2 (TCF7L2) gene predict conversion to type 2 diabetes in the finnish diabetes prevention study and are associated with impaired glucose regulation and impaired insulin secretion. Diabetologia
  209. (2007). Variants of transcription factor 7-like 2 (TCF7L2) gene predict conversion to type 2 diabetes in the Finnish Diabetes Prevention Study and are associated with impaired glucose regulation and impaired insulin secretion."
  210. (2005). What makes a good genetic association study? Lancet
  211. (2005). WNK1 Kinase polymorphism and blood pressure response to a thiazide diuretic. Hypertension

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.