11 research outputs found
Recommended from our members
Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies
Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs
Recommended from our members
Association of Interleukin 6 Receptor Variant With Cardiovascular Disease Effects of Interleukin 6 Receptor Blocking Therapy A Phenome-Wide Association Study
This phenome-wide association study assesses clinical associations between interleukin 6 receptor (
IL6R
) single-nucleotide polyporphisms and known IL6R drug effects and whether large biobanks and genetics can be used to assess potential beneficial and adverse effects of therapeutic agents with known pathways and related genes
Recommended from our members
Effects of Genetic Variants Associated with Familial Hypercholesterolemia on Low-Density Lipoprotein-Cholesterol Levels and Cardiovascular Outcomes in the Million Veteran Program
Familial hypercholesterolemia (FH) is characterized by inherited high levels of low-density lipoprotein cholesterol (LDL-C) and premature coronary heart disease (CHD). Over a thousand low-frequency variants in
and
have been implicated in FH but few have been examined at the population level. We aim to estimate the phenotypic effects of a subset of FH variants on LDL-C and clinical outcomes among 331,107 multi-ethnic participants.
We examined the individual and collective association between putatively pathogenic FH variants included on the MVP biobank array and the maximum LDL-C level over an interval of 15 years (maxLDL). We assessed the collective effect on clinical outcomes by leveraging data from 61.7 million clinical encounters.
We found 8 out of 16 putatively pathogenic FH variants with ≥30 observed carriers to be significantly associated with elevated maxLDL (9.4-80.2 mg/dL). Phenotypic effects were similar for European and African Americans despite substantial differences in carrier frequencies. Based on observed effects on maxLDL, we identified a total of 748 carriers (1:443) who had elevated maxLDL (36.5±1.4 mg/dL, p=1.2×10
), and higher prevalence of clinical diagnoses related to hypercholesterolemia and CHD in a phenome-wide scan. Adjusted for maxLDL, FH variants collectively associated with higher prevalence of CHD (odds ratio, 1.59 [95% CI 1.36-1.86], p=1.1×10
) but not peripheral artery disease.
The distribution and phenotypic effects of putatively pathogenic FH variants were heterogeneous within and across variants. More robust evidence of genotype-phenotype associations of FH variants in multi-ethnic populations is needed to accurately infer at-risk individuals from genetic screening
High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
© 2019, The Author(s), under exclusive licence to Springer Nature Limited. Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no)
Recommended from our members
Genome-wide Association Study of Maximum Habitual Alcohol Intake in >140,000 U.S. European and African American Veterans Yields Novel Risk Loci
BackgroundHabitual alcohol use can be an indicator of alcohol dependence, which is associated with a wide range of serious health problems.MethodsWe completed a genome-wide association study in 126,936 European American and 17,029 African American subjects in the Veterans Affairs Million Veteran Program for a quantitative phenotype based on maximum habitual alcohol consumption.ResultsADH1B, on chromosome 4, was the lead locus for both populations: for the European American sample, rs1229984 (p = 4.9 × 10-47); for African American, rs2066702 (p = 2.3 × 10-12). In the European American sample, we identified three additional genome-wide-significant maximum habitual alcohol consumption loci: on chromosome 17, rs77804065 (p = 1.5 × 10-12), at CRHR1 (corticotropin-releasing hormone receptor 1); the protein product of this gene is involved in stress and immune responses; and on chromosomes 8 and 10. European American and African American samples were then meta-analyzed; the associated region at CRHR1 increased in significance to 1.02 × 10-13, and we identified two additional genome-wide significant loci, FGF14 (p = 9.86 × 10-9) (chromosome 13) and a locus on chromosome 11. Besides ADH1B, none of the five loci have prior genome-wide significant support. Post-genome-wide association study analysis identified genetic correlation to other alcohol-related traits, smoking-related traits, and many others. Replications were observed in UK Biobank data. Genetic correlation between maximum habitual alcohol consumption and alcohol dependence was 0.87 (p = 4.78 × 10-9). Enrichment for cell types included dopaminergic and gamma-aminobutyric acidergic neurons in midbrain, and pancreatic delta cells.ConclusionsThe present study supports five novel alcohol-use risk loci, with particularly strong statistical support for CRHR1. Additionally, we provide novel insight regarding the biology of harmful alcohol use
Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies
Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs
Genome-wide Association Study of Maximum Habitual Alcohol Intake in >140,000 U.S. European and African American Veterans Yields Novel Risk Loci
Recommended from our members
Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies
Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs
Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program.
The Million Veteran Program (MVP) was established in 2011 as a national research initiative to determine how genetic variation influences the health of US military veterans. Here we genotyped 312,571 MVP participants using a custom biobank array and linked the genetic data to laboratory and clinical phenotypes extracted from electronic health records covering a median of 10.0 years of follow-up. Among 297,626 veterans with at least one blood lipid measurement, including 57,332 black and 24,743 Hispanic participants, we tested up to around 32 million variants for association with lipid levels and identified 118 novel genome-wide significant loci after meta-analysis with data from the Global Lipids Genetics Consortium (total n > 600,000). Through a focus on mutations predicted to result in a loss of gene function and a phenome-wide association study, we propose novel indications for pharmaceutical inhibitors targeting PCSK9 (abdominal aortic aneurysm), ANGPTL4 (type 2 diabetes) and PDE3B (triglycerides and coronary disease)