23 research outputs found
Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach
BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first.
OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms.
METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data.
RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P \u3c .001 for all measures and all conditions). Using a rich set of learning features contributed to ADS\u27s performance substantially.
CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS\u27s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request
Automatic extraction of informal topics from online suicidal ideation
Abstract
Background
Suicide is an alarming public health problem accounting for a considerable number of deaths each year worldwide. Many more individuals contemplate suicide. Understanding the attributes, characteristics, and exposures correlated with suicide remains an urgent and significant problem. As social networking sites have become more common, users have adopted these sites to talk about intensely personal topics, among them their thoughts about suicide. Such data has previously been evaluated by analyzing the language features of social media posts and using factors derived by domain experts to identify at-risk users.
Results
In this work, we automatically extract informal latent recurring topics of suicidal ideation found in social media posts. Our evaluation demonstrates that we are able to automatically reproduce many of the expertly determined risk factors for suicide. Moreover, we identify many informal latent topics related to suicide ideation such as concerns over health, work, self-image, and financial issues.
Conclusions
These informal topics topics can be more specific or more general. Some of our topics express meaningful ideas not contained in the risk factors and some risk factors do not have complimentary latent topics. In short, our analysis of the latent topics extracted from social media containing suicidal ideations suggests that users of these systems express ideas that are complementary to the topics defined by experts but differ in their scope, focus, and precision of language.https://deepblue.lib.umich.edu/bitstream/2027.42/144214/1/12859_2018_Article_2197.pd
Frequent Closed Itemset Mining Using Prefix Graphs with an Efficient Flow-Based Pruning Strategy
This paper presents PGMiner, a novel graph-based algorithm for mining frequent closed itemsets. Our approach consists of constructing a prefix graph structure and decomposing the database to variable length bit vectors, which are assigned to nodes of the graph. The main advantage of this representation is that the bit vectors at each node are relatively shorter than those produced by existing vertical mining methods. This facilitates fast frequency counting of itemsets via intersection operations. We also devise several internode and intra-node pruning strategies to substantially reduce the combinatorial search space. Unlike other existing approaches, we do not need to store in memory the entire set of closed itemsets that have been mined so far in order to check whether a candidate itemset is closed. This dramatically reduces the memory usage of our algorithm, especially for low support thresholds. Our experiments using synthetic and real-world data sets show that PGMiner outperforms existing mining algorithms by as much as an order of magnitude and is scalable to very large databases. 1
Unlocking echocardiogram measurements for heart disease research through natural language processing
Abstract Background In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. Implementation A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. Results The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. Conclusions This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing
Understanding headache classification coding within the veterans health administration using ICD-9-CM and ICD-10-CM in fiscal years 2014-2017.
ObjectivesUnderstand the continuity and changes in headache not-otherwise-specified (NOS), migraine, and post-traumatic headache (PTH) diagnoses after the transition from ICD-9-CM to ICD-10-CM in the Veterans Health Administration (VHA).BackgroundHeadache is one of the most commonly diagnosed chronic conditions managed within primary and specialty care clinics. The VHA transitioned from ICD-9-CM to ICD-10-CM on October-1-2015. The effect transitioning on coding of specific headache diagnoses is unknown. Accuracy of headache diagnosis is important since different headache types respond to different treatments.MethodsWe mapped headache diagnoses from ICD-9-CM (FY 2014/2015) onto ICD-10-CM (FY 2016/2017) and computed coding proportions two years before/after the transition in VHA. We used queries to determine the change in transition pathways. We report the odds of ICD-10-CM coding associated with ICD-9-CM controlling for provider type, and patient age, sex, and race/ethnicity.ResultsOnly 37%, 58% and 34% of patients with ICD-9-CM coding of NOS, migraine, and PTH respectively had an ICD-10-CM headache diagnosis. Of those with an ICD-10-CM diagnosis, 73-79% had a single headache diagnosis. The odds ratios for receiving the same code in both ICD-9-CM and ICD-10-CM after adjustment for ICD-9-CM and ICD-10-CM headache comorbidities and sociodemographic factors were high (range 6-26) and statistically significant. Specifically, 75% of patients with headache NOS had received one headache diagnoses (Adjusted headache NOS-ICD-9-CM OR for headache NOS-ICD-10-CM = 6.1, 95% CI 5.89-6.32. 79% of migraineurs had one headache diagnoses, mostly migraine (Adjusted migraine-ICD-9-CM OR for migraine-ICD-10-CM = 26.43, 95% CI 25.51-27.38). The same held true for PTH (Adjusted PTH-ICD-9-CM OR for PTH-ICD-10-CM = 22.92, 95% CI: 18.97-27.68). These strong associations remained after adjustment for specialist care in ICD-10-CM follow-up period.DiscussionThe majority of people with ICD-9-CM headache diagnoses did not have an ICD-10-CM headache diagnosis. However, a given diagnosis in ICD-9-CM by a primary care provider (PCP) was significantly predictive of its assignment in ICD-10-CM as was seeing either a neurologist or physiatrist (compared to a generalist) for an ICD-10-CM headache diagnosis.ConclusionWhen a veteran had a specific diagnosis in ICD-9-CM, the odds of being coded with the same diagnosis in ICD-10-CM were significantly higher. Specialist visit during the ICD-10-CM period was independently associated with all three ICD-10-CM headaches
Estimating healthcare mobility in the Veterans Affairs Healthcare System
Abstract Background Healthcare mobility, defined as healthcare utilization in more than one distinct healthcare system, may have detrimental effects on outcomes of care. We characterized healthcare mobility and associated characteristics among a national sample of Veterans. Methods Using the Veterans Health Administration Electronic Health Record, we conducted a retrospective cohort study to quantify healthcare mobility within a four year period. We examined the association between sociodemographic and clinical characteristics and healthcare mobility, and characterized possible temporal and geographic patterns of healthcare mobility. Results Approximately nine percent of the sample were healthcare mobile. Younger Veterans, divorced or separated Veterans, and those with hepatitis C virus and psychiatric disorders were more likely to be healthcare mobile. We demonstrated two possible patterns of healthcare mobility, related to specialty care and lifestyle, in which Veterans repeatedly utilized two different healthcare systems. Conclusions Healthcare mobility is associated with young age, marital status changes, and also diseases requiring intensive management. This type of mobility may affect disease prevention and management and has implications for healthcare systems that seek to improve population health