26 research outputs found
Extraction of clinical phenotypes for Alzheimer\u27s disease dementia from clinical notes using natural language processing
OBJECTIVES: There is much interest in utilizing clinical data for developing prediction models for Alzheimer\u27s disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR.
MATERIALS AND METHODS: We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings.
RESULTS: Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen\u27s kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline\u27s performance (average F1-score = 0.65-0.99) for each phenotype.
DISCUSSION: We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success.
CONCLUSION: Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability
Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: A performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy\u27s rule-based and machine learning-based methods
OBJECTIVE: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients\u27 health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI\u27s Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy.
MATERIALS AND METHODS: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores.
RESULTS: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy\u27s models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance.
DISCUSSION AND CONCLUSION: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction
Pathogenic variants in CRX have distinct cis-regulatory effects on enhancers and silencers in photoreceptors
Dozens of variants in the gene for the homeodomain transcription factor (TF) cone-rod homeobox
Age at first birth in women is genetically associated with increased risk of schizophrenia
Prof. Paunio on PGC:n jäsenPrevious studies have shown an increased risk for mental health problems in children born to both younger and older parents compared to children of average-aged parents. We previously used a novel design to reveal a latent mechanism of genetic association between schizophrenia and age at first birth in women (AFB). Here, we use independent data from the UK Biobank (N = 38,892) to replicate the finding of an association between predicted genetic risk of schizophrenia and AFB in women, and to estimate the genetic correlation between schizophrenia and AFB in women stratified into younger and older groups. We find evidence for an association between predicted genetic risk of schizophrenia and AFB in women (P-value = 1.12E-05), and we show genetic heterogeneity between younger and older AFB groups (P-value = 3.45E-03). The genetic correlation between schizophrenia and AFB in the younger AFB group is -0.16 (SE = 0.04) while that between schizophrenia and AFB in the older AFB group is 0.14 (SE = 0.08). Our results suggest that early, and perhaps also late, age at first birth in women is associated with increased genetic risk for schizophrenia in the UK Biobank sample. These findings contribute new insights into factors contributing to the complex bio-social risk architecture underpinning the association between parental age and offspring mental health.Peer reviewe
Measurement of the t-channel single-top-quark production cross section and of the |Vtb| CKM matrix element in pp collisions at SQR = 8 TeV
Measurements are presented of the t -channel single-top-quark production cross section in proton-proton collisions at s√ = 8 TeV. The results are based on a data sample corresponding to an integrated luminosity of 19.7 fb −1 recorded with the CMS detector at the LHC. The cross section is measured inclusively, as well as separately for top (t) and antitop (t¯) , in final states with a muon or an electron. The measured inclusive t -channel cross section is σ t -ch. = 83 . 6 ± 2 . 3 (stat.) ± 7 . 4 (syst.) pb. The single t and t¯ cross sections are measured to be σ t -ch. ( t ) = 53 . 8 ± 1 . 5 (stat.) ± 4 . 4 (syst.) pb and σ t -ch. (t¯) = 27 . 6 ± 1 . 3 (stat.) ± 3 . 7 (syst.) pb, respectively. The measured ratio of cross sections is R t -ch. = σ t -ch. (t) /σ t -ch. (t¯) = 1 . 95 ± 0 . 10 (stat.) ± 0 . 19 (syst.), in agreement with the standard model prediction. The modulus of the Cabibbo-Kobayashi-Maskawa matrix element V tb is extracted and, in combination with a previous CMS result at s√ = 7 TeV, a value | V tb | = 0 . 998 ± 0 . 038 (exp.) ± 0 . 016 (theo.) is obtained
Gene expression imputation across multiple brain regions provides insights into schizophrenia risk
Transcriptomic imputation approaches combine eQTL reference panels with large-scale genotype data in order to test associations between disease and gene expression. These genic associations could elucidate signals in complex genome-wide association study (GWAS) loci and may disentangle the role of different tissues in disease development. We used the largest eQTL reference panel for the dorso-lateral prefrontal cortex (DLPFC) to create a set of gene expression predictors and demonstrate their utility. We applied DLPFC and 12 GTEx-brain predictors to 40,299 schizophrenia cases and 65,264 matched controls for a large transcriptomic imputation study of schizophrenia. We identified 413 genic associations across 13 brain regions. Stepwise conditioning identified 67 non-MHC genes, of which 14 did not fall within previous GWAS loci. We identified 36 significantly enriched pathways, including hexosaminidase-A deficiency, and multiple porphyric disorder pathways. We investigated developmental expression patterns among the 67 non-MHC genes and identified specific groups of pre- and postnatal expression
Genetic correlation between amyotrophic lateral sclerosis and schizophrenia
A. Palotie on työryhmän Schizophrenia Working Grp Psychiat jäsen.We have previously shown higher-than-expected rates of schizophrenia in relatives of patients with amyotrophic lateral sclerosis (ALS), suggesting an aetiological relationship between the diseases. Here, we investigate the genetic relationship between ALS and schizophrenia using genome-wide association study data from over 100,000 unique individuals. Using linkage disequilibrium score regression, we estimate the genetic correlation between ALS and schizophrenia to be 14.3% (7.05-21.6; P = 1 x 10(-4)) with schizophrenia polygenic risk scores explaining up to 0.12% of the variance in ALS (P = 8.4 x 10(-7)). A modest increase in comorbidity of ALS and schizophrenia is expected given these findings (odds ratio 1.08-1.26) but this would require very large studies to observe epidemiologically. We identify five potential novel ALS-associated loci using conditional false discovery rate analysis. It is likely that shared neurobiological mechanisms between these two disorders will engender novel hypotheses in future preclinical and clinical studies.Peer reviewe
Interaction Testing and Polygenic Risk Scoring to Estimate the Association of Common Genetic Variants with Treatment Resistance in Schizophrenia
Importance: About 20% to 30% of people with schizophrenia have psychotic symptoms that do not respond adequately to first-line antipsychotic treatment. This clinical presentation, chronic and highly disabling, is known as treatment-resistant schizophrenia (TRS). The causes of treatment resistance and their relationships with causes underlying schizophrenia are largely unknown. Adequately powered genetic studies of TRS are scarce because of the difficulty in collecting data from well-characterized TRS cohorts. Objective: To examine the genetic architecture of TRS through the reassessment of genetic data from schizophrenia studies and its validation in carefully ascertained clinical samples. Design, Setting, and Participants: Two case-control genome-wide association studies (GWASs) of schizophrenia were performed in which the case samples were defined as individuals with TRS (n = 10501) and individuals with non-TRS (n = 20325). The differences in effect sizes for allelic associations were then determined between both studies, the reasoning being such differences reflect treatment resistance instead of schizophrenia. Genotype data were retrieved from the CLOZUK and Psychiatric Genomics Consortium (PGC) schizophrenia studies. The output was validated using polygenic risk score (PRS) profiling of 2 independent schizophrenia cohorts with TRS and non-TRS: a prevalence sample with 817 individuals (Cardiff Cognition in Schizophrenia [CardiffCOGS]) and an incidence sample with 563 individuals (Genetics Workstream of the Schizophrenia Treatment Resistance and Therapeutic Advances [STRATA-G]). Main Outcomes and Measures: GWAS of treatment resistance in schizophrenia. The results of the GWAS were compared with complex polygenic traits through a genetic correlation approach and were used for PRS analysis on the independent validation cohorts using the same TRS definition. Results: The study included a total of 85490 participants (48635 [56.9%] male) in its GWAS stage and 1380 participants (859 [62.2%] male) in its PRS validation stage. Treatment resistance in schizophrenia emerged as a polygenic trait with detectable heritability (1% to 4%), and several traits related to intelligence and cognition were found to be genetically correlated with it (genetic correlation, 0.41-0.69). PRS analysis in the CardiffCOGS prevalence sample showed a positive association between TRS and a history of taking clozapine (r2 = 2.03%; P =.001), which was replicated in the STRATA-G incidence sample (r2 = 1.09%; P =.04). Conclusions and Relevance: In this GWAS, common genetic variants were differentially associated with TRS, and these associations may have been obscured through the amalgamation of large GWAS samples in previous studies of broadly defined schizophrenia. Findings of this study suggest the validity of meta-analytic approaches for studies on patient outcomes, including treatment resistance
Schizophrenia-associated somatic copy-number variants from 12,834 cases reveal recurrent NRXN1 and ABCB11 disruptions
While germline copy-number variants (CNVs) contribute to schizophrenia (SCZ) risk, the contribution of somatic CNVs (sCNVs)—present in some but not all cells—remains unknown. We identified sCNVs using blood-derived genotype arrays from 12,834 SCZ cases and 11,648 controls, filtering sCNVs at loci recurrently mutated in clonal blood disorders. Likely early-developmental sCNVs were more common in cases (0.91%) than controls (0.51%, p = 2.68e−4), with recurrent somatic deletions of exons 1–5 of the NRXN1 gene in five SCZ cases. Hi-C maps revealed ectopic, allele-specific loops forming between a potential cryptic promoter and non-coding cis-regulatory elements upon 5′ deletions in NRXN1. We also observed recurrent intragenic deletions of ABCB11, encoding a transporter implicated in anti-psychotic response, in five treatment-resistant SCZ cases and showed that ABCB11 is specifically enriched in neurons forming mesocortical and mesolimbic dopaminergic projections. Our results indicate potential roles of sCNVs in SCZ risk