3 research outputs found
Supplemental materials for preprint: A Case for Developing Domain-Specific Vocabularies for Extracting Suicide Factors from Healthcare Notes
The onset and persistence of life events (LE) such as housing instability, job instability, and
reduced social connection have been shown to increase risk of suicide. Predictive models for
suicide risk have low sensitivity to many of these factors due to under-reporting in structured
electronic health records (EHR) data. In this study, we show how natural language processing
(NLP) can help identify LE in clinical notes at higher rates than reported medical codes. We
compare domain-specific lexicons formulated from Unified Medical Language System (UMLS)
selection, content analysis by subject matter experts (SME) and the Gravity Project, to datadriven
expansion through contextual word embedding using Word2Vec. Our analysis covers
EHR from the VA Corporate Data Warehouse (CDW) and measures the prevalence of LE
across time for patients with known underlying cause of death in the National Death Index
(NDI). We found that NLP methods had higher sensitivity of detecting LE relative to structured
EHR variables. We observed that, on average, suicide cases had higher rates of LE over time
when compared to patients who died of non-suicide related causes with no previous history of
diagnosed mental illness. When used to discriminate these outcomes, the inclusion of NLP
derived variables increased the concentration of LE along the top 0.1%, 0.5% and 1% of
predicted risk. LE were less informative when discriminating suicide death from non-suicide
related death for patients with diagnosed mental illness
A Case for Developing Domain-Specific Vocabularies for Extracting Suicide Factors from Healthcare Notes
The onset and persistence of life events (LE) such as housing instability, job instability, and reduced social connection have been shown to increase risk of suicide. Predictive models for suicide risk have low sensitivity to many of these factors due to under-reporting in structured electronic health records (EHR) data. In this study, we show how natural language processing (NLP) can help identify LE in clinical notes at higher rates than reported medical codes. We compare domain-specific lexicons formulated from Unified Medical Language System (UMLS) selection, content analysis by subject matter experts (SME) and the Gravity Project, to datadriven expansion through contextual word embedding using Word2Vec. Our analysis covers EHR from the VA Corporate Data Warehouse (CDW) and measures the prevalence of LE across time for patients with known underlying cause of death in the National Death Index (NDI). We found that NLP methods had higher sensitivity of detecting LE relative to structured EHR variables. We observed that, on average, suicide cases had higher rates of LE over time when compared to patients who died of non-suicide related causes with no previous history of diagnosed mental illness. When used to discriminate these outcomes, the inclusion of NLP derived variables increased the concentration of LE along the top 0.1%, 0.5% and 1% of predicted risk. LE were less informative when discriminating suicide death from non suicide related death for patients with diagnosed mental illness
Artificial intelligence to unlock real‐world evidence in clinical oncology: A primer on recent advances
PurposeReal world evidence is crucial to understanding the diffusion of new oncologic therapies, monitoring cancer outcomes, and detecting unexpected toxicities. In practice, real world evidence is challenging to collect rapidly and comprehensively, often requiring expensive and time-consuming manual case-finding and annotation of clinical text. In this Review, we summarise recent developments in the use of artificial intelligence to collect and analyze real world evidence in oncology.MethodsWe performed a narrative review of the major current trends and recent literature in artificial intelligence applications in oncology.ResultsArtificial intelligence (AI) approaches are increasingly used to efficiently phenotype patients and tumors at large scale. These tools also may provide novel biological insights and improve risk prediction through multimodal integration of radiographic, pathological, and genomic datasets. Custom language processing pipelines and large language models hold great promise for clinical prediction and phenotyping.ConclusionsDespite rapid advances, continued progress in computation, generalizability, interpretability, and reliability as well as prospective validation are needed to integrate AI approaches into routine clinical care and real-time monitoring of novel therapies