234 research outputs found
Rare protection against type 1 diabetes
Next-generation DNA sequencing reveals rare alleles protective from type 1 diabetes
Genetic Risk Score Predicting Risk of Rheumatoid Arthritis Phenotypes and Age of Symptom Onset
Cumulative genetic profiles can help identify individuals at high-risk for developing RA. We examined the impact of 39 validated genetic risk alleles on the risk of RA phenotypes characterized by serologic and erosive status.We evaluated single nucleotide polymorphisms at 31 validated RA risk loci and 8 Human Leukocyte Antigen alleles among 542 Caucasian RA cases and 551 Caucasian controls from Nurses' Health Study and Nurses' Health Study II. We created a weighted genetic risk score (GRS) and evaluated it as 7 ordinal groups using logistic regression (adjusting for age and smoking) to assess the relationship between GRS group and odds of developing seronegative (RF- and CCP-), seropositive (RF+ or CCP+), erosive, and seropositive, erosive RA phenotypes. In separate case only analyses, we assessed the relationships between GRS and age of symptom onset. In 542 RA cases, 317 (58%) were seropositive, 163 (30%) had erosions and 105 (19%) were seropositive with erosions. Comparing the highest GRS risk group to the median group, we found an OR of 1.2 (95% CI = 0.8-2.1) for seronegative RA, 3.0 (95% CI = 1.9-4.7) for seropositive RA, 3.2 (95% CI = 1.8-5.6) for erosive RA, and 7.6 (95% CI = 3.6-16.3) for seropositive, erosive RA. No significant relationship was seen between GRS and age of onset.Results suggest that seronegative and seropositive/erosive RA have different genetic architecture and support the importance of considering RA phenotypes in RA genetic studies
Recommended from our members
Identification of the NF-κB activating protein-like locus as a risk locus for rheumatoid arthritis
Objective: To fine-map the NF-κB activating protein-like (NKAPL) locus identified in a prior genome-wide study as a possible rheumatoid arthritis (RA) risk locus and thereby delineate additional variants with stronger and/or independent disease association. Methods: Genotypes for 101 SNPs across the NKAPL locus on chromosome 6p22.1 were obtained on 1368 Canadian RA cases and 1471 controls. Single marker associations were examined using logistic regression and the most strongly associated NKAPL locus SNPs then typed in another Canadian and a US-based RA case/control cohort. Results: Fine-mapping analyses identified six NKAPL locus variants in a single haplotype block showing association with p≤5.6×10−8 in the combined Canadian cohort. Among these SNPs, rs35656932 in the zinc finger 193 gene and rs13208096 in the NKAPL gene remained significant after conditional logistic regression, contributed independently to risk for disease, and were replicated in the US cohort (Pcomb=4.24×10−10 and 2.44×10−9, respectively). These associations remained significant after conditioning on SNPs tagging the HLA-shared epitope (SE) DRB1*0401 allele and were significantly stronger in the HLA-SE negative versus positive subgroup, with a significant negative interaction apparent between HLA-DRB1 SE and NKAPL risk alleles. Conclusions: By illuminating additional NKAPL variants with highly significant effects on risk that are distinct from, but interactive with those arising from the HLA-DRB1 locus, our data conclusively identify NKAPL as an RA susceptibility locus
Recommended from our members
Data for Genetic Analysis Workshop 16 Problem 1, Association Analysis of Rheumatoid Arthritis Data
For Genetic Analysis Workshop 16 Problem 1, we provided data for genome-wide association analysis of rheumatoid arthritis. Single-nucleotide polymorphism (SNP) genotype data were provided for 868 cases and 1194 controls that had been assayed using an Illumina 550 k platform. In addition, phenotypic data were provided from genotyping DRB1 alleles, which were classified according to the rheumatoid arthritis shared epitope, levels of anti-cyclic citrullinated peptide, and levels of rheumatoid factor IgM. Several questions could be addressed using the data, including analysis of genetic associations using single SNPs or haplotypes, as well as gene-gene and genetic analysis of SNPs for qualitative and quantitative factors
Recommended from our members
Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
Objective: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. Materials and Methods The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. Results: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. Conclusion: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies
Recommended from our members
The Influence of Polygenic Risk Scores on Heritability of Anti-CCP Level in RA
Objective: To study genetic factors that influence quantitative anti-cyclic citrullinated peptide (anti-CCP) antibody levels in RA patients. Methods: We carried out a genome wide association study (GWAS) meta-analysis using 1,975 anti-CCP+ RA patients from 3 large cohorts, the Brigham Rheumatoid Arthritis Sequential Study (BRASS), North American Rheumatoid Arthritis Consortium (NARAC), and the Epidemiological Investigation of RA (EIRA). We also carried out a genome-wide complex trait analysis (GCTA) to estimate the heritability of anti-CCP levels. Results: GWAS-meta analysis showed that anti-CCP levels were most strongly associated with the human leukocyte antigen (HLA) region with a p-value of 2×10−11 for rs1980493. There were 112 SNPs in this region that exceeded the genome-wide significance threshold of 5×10−8, and all were in linkage disequilibrium (LD) with the HLA- DRB1*03 allele with LD r2 in the range of 0.25-0.88. Suggestive novel associations outside of the HLA region were also observed for rs8063248 (near the GP2 gene) with a p-value of 3×10−7. None of the known RA risk alleles (~52 loci) were associated with anti-CCP level. Heritability analysis estimated that 44% of anti-CCP variation was attributable to genetic factors captured by GWAS variants. Conclusions: Anti-CCP level is a heritable trait. HLA-DR3 and GP2 are associated with lower anti-CCP levels
Analysis and Application of European Genetic Substructure Using 300 K SNP Information
European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA) showed the largest division/principal component (PC) differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient. Application of this substructure information was critical in examining a real dataset in whole genome association (WGA) analyses for rheumatoid arthritis in European Americans to reduce false positive signals. In addition, two sets of European substructure ancestry informative markers (ESAIMs) were identified that provide substantial substructure information. The results provide further insight into European population genetic substructure and show that this information can be used for improving error rates in association testing of candidate genes and in replication studies of WGA scans
Improving Case Definition of Crohnʼs Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing
available in PMC 2014 June 01Background:
Previous studies identifying patients with inflammatory bowel disease using administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record–based model for classification of inflammatory bowel disease leveraging the combination of codified data and information from clinical text notes using natural language processing.
Methods:
Using the electronic medical records of 2 large academic centers, we created data marts for Crohn’s disease (CD) and ulcerative colitis (UC) comprising patients with ≥1 International Classification of Diseases, 9th edition, code for each disease. We used codified (i.e., International Classification of Diseases, 9th edition codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables.
Results:
We confirmed 399 CD cases (67%) in the CD training set and 378 UC cases (63%) in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve for CD 0.95; UC 0.94) than models using only disease International Classification of Diseases, 9th edition codes (area under the curve 0.89 for CD; 0.86 for UC). Addition of natural language processing narrative terms to our final model resulted in classification of 6% to 12% more subjects with the same accuracy.
Conclusions:
Inclusion of narrative concepts identified using natural language processing improves the accuracy of electronic medical records case definition for CD and UC while simultaneously identifying more subjects compared with models using codified data alone.National Institutes of Health (U.S.) (NIH U54-LM008748)American Gastroenterological AssociationNational Institutes of Health (U.S.) (NIH K08 AR060257)Beth Isreal Deaconess Medical Center (Katherine Swan Ginsburg Fund)National Institutes of Health (U.S.) (NIH R01-AR056768)Burroughs Wellcome Fund (Career Award for Medical Scientists)National Institutes of Health (U.S.) (NIH U01-GM092691)National Institutes of Health (U.S.) (NIH R01-AR059648
- …