1,442 research outputs found

    Finding Important Terms for Patients in Their Electronic Health Records: A Learning-to-Rank Approach Using Expert Annotations

    Get PDF
    BACKGROUND: Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients\u27 notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care. OBJECTIVE: We aimed to develop a supervised natural language processing (NLP) system called Finding impOrtant medical Concepts most Useful to patientS (FOCUS) that automatically identifies and ranks medical terms in EHR notes based on their importance to the patients. METHODS: First, we built an expert-annotated corpus. For each EHR note, 2 physicians independently identified medical terms important to the patient. Using the physicians\u27 agreement as the gold standard, we developed and evaluated FOCUS. FOCUS first identifies candidate terms from each EHR note using MetaMap and then ranks the terms using a support vector machine-based learn-to-rank algorithm. We explored rich learning features, including distributed word representation, Unified Medical Language System semantic type, topic features, and features derived from consumer health vocabulary. We compared FOCUS with 2 strong baseline NLP systems. RESULTS: Physicians annotated 90 EHR notes and identified a mean of 9 (SD 5) important terms per note. The Cohen\u27s kappa annotation agreement was .51. The 10-fold cross-validation results show that FOCUS achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.940 for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FOCUS for identifying important terms from EHR notes was 0.866 AUC-ROC. Both performance scores significantly exceeded the corresponding baseline system scores (P \u3c .001). Rich learning features contributed to FOCUS\u27s performance substantially. CONCLUSIONS: FOCUS can automatically rank terms from EHR notes based on their importance to patients. It may help develop future interventions that improve quality of care

    Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach

    Get PDF
    BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P \u3c .001 for all measures and all conditions). Using a rich set of learning features contributed to ADS\u27s performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS\u27s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request

    Extractive Summarization : Experimental work on nursing notes in Finnish

    Get PDF
    Natural Language Processing (NLP) is a subfield of artificial intelligence and linguistics that is concerned with how a computer machine interacts with human language. With the increasing computational power and the advancement in technologies, researchers have been successful at proposing various NLP tasks that have already been implemented as real-world applications today. Automated text summarization is one of the many tasks that has not yet completely matured particularly in health sector. A success in this task would enable healthcare professionals to grasp patient's history in a minimal time resulting in faster decisions required for better care. Automatic text summarization is a process that helps shortening a large text without sacrificing important information. This could be achieved by paraphrasing the content known as the abstractive method or by concatenating relevant extracted sentences namely the extractive method. In general, this process requires the conversion of text into numerical form and then a method is executed to identify and extract relevant text. This thesis is an attempt of exploring NLP techniques used in extractive text summarization particularly in health domain. The work includes a comparison of basic summarizing models implemented on a corpus of patient notes written by nurses in Finnish language. Concepts and research studies required to understand the implementation have been documented along with the description of the code. A python-based project is structured to build a corpus and execute multiple summarizing models. For this thesis, we observe the performance of two textual embeddings namely Term Frequency - Inverse Document Frequency (TF-IDF) which is based on simple statistical measure and Word2Vec which is based on neural networks. For both models, LexRank, an unsupervised stochastic graph-based sentence scoring algorithm, is used for sentence extraction and a random selection method is used as a baseline method for evaluation. To evaluate and compare the performance of models, summaries of 15 patient care episodes of each model were provided to two human beings for manual evaluations. According to the results of the small sample dataset, we observe that both evaluators seem to agree with each other in preferring summaries produced by Word2Vec LexRank over the summaries generated by TF-IDF LexRank. Both models have also been observed, by both evaluators, to perform better than the baseline model of random selection

    Discharge Summary Hospital Course Summarisation of In Patient Electronic Health Record Text with Clinical Concept Guided Deep Pre-Trained Transformer Models

    Full text link
    Brief Hospital Course (BHC) summaries are succinct summaries of an entire hospital encounter, embedded within discharge summaries, written by senior clinicians responsible for the overall care of a patient. Methods to automatically produce summaries from inpatient documentation would be invaluable in reducing clinician manual burden of summarising documents under high time-pressure to admit and discharge patients. Automatically producing these summaries from the inpatient course, is a complex, multi-document summarisation task, as source notes are written from various perspectives (e.g. nursing, doctor, radiology), during the course of the hospitalisation. We demonstrate a range of methods for BHC summarisation demonstrating the performance of deep learning summarisation models across extractive and abstractive summarisation scenarios. We also test a novel ensemble extractive and abstractive summarisation model that incorporates a medical concept ontology (SNOMED) as a clinical guidance signal and shows superior performance in 2 real-world clinical data sets

    Development of Artificial Intelligence Algorithms for Early Diagnosis of Sepsis

    Get PDF
    Sepsis is a prevalent syndrome that manifests itself through an uncontrolled response from the body to an infection, that may lead to organ dysfunction. Its diagnosis is urgent since early treatment can reduce the patients’ chances of having long-term consequences. Yet, there are many obstacles to achieving this early detection. Some stem from the syndrome’s pathogenesis, which lacks a characteristic biomarker. The available clinical detection tools are either too complex or lack sensitivity, in both cases delaying the diagnosis. Another obstacle relates to modern technology, that when paired with the many clinical parameters that are monitored to detect sepsis, result in extremely heterogenous and complex medical records, which constitute a big obstacle for the responsible clinicians, that are forced to analyse them to diagnose the syndrome. To help achieve this early diagnosis, as well as understand which parameters are most relevant to obtain it, an approach based on the use of Artificial Intelligence algorithms is proposed in this work, with the model being implemented in the alert system of a sepsis monitoring platform. This platform uses a Random Forest algorithm, based on supervised machine learning classification, that is capable of detecting the syndrome in two different scenarios. The earliest detection can happen if there are only five vital sign parameters available for measurement, namely heart rate, systolic and diastolic blood pressures, blood oxygen saturation level, and body temperature, in which case, the model has a score of 83% precision and 62% sensitivity. If besides the mentioned variables, laboratory analysis measurements of bilirubin, creatinine, hemoglobin, leukocytes, platelet count, and Creactive protein levels are available, the platform’s sensitivity increases to 77%. With this, it has also been found that the blood oxygen saturation level is one of the most important variables to take into account for the task, in both cases. Once the platform is tested in real clinical situations, together with an increase in the available clinical data, it is believed that the platform’s performance will be even better.A sépsis é uma síndrome com elevada incidência a nível global, que se manifesta através de uma resposta desregulada por parte do organismo a uma infeção, podendo resultar em disfunções orgânicas generalizadas. O diagnóstico da mesma é urgente, uma vez que um tratamento precoce pode reduzir as hipóteses de consequências a longo prazo para os doentes. Apesar desta necessidade, existem vários obstáculos. Alguns deles advêm da patogenia da síndrome, que carece de um biomarcador específico. As ferramentas de deteção clínica são demasiado complexas, ou pouco sensíveis, em ambos os casos atrasando o diagnóstico. Outro obstáculo relaciona-se com os avanços da tecnologia, que, com os vários parâmetros clínicos que são monitorizados, resulta em registos médicos heterogéneos e complexos, o que constitui um grande obstáculo para os profissionais de saúde, que se vêm forçados a analisá-los para diagnosticar a síndrome. Para atingir este diagnóstico precoce, bem como compreender quais os parâmetros mais relevantes para o alcançar, é proposta neste trabalho uma abordagem baseada num algoritmo de Inteligência Artificial, sendo o modelo implementado no sistema de alerta de uma plataforma de monitorização de sépsis. Esta plataforma utiliza um classificador Random Forest baseado em aprendizagem automática supervisionada, capaz de diagnosticar a síndrome de duas formas. Uma deteção mais precoce pode ocorrer através de cinco parâmetros vitais, nomeadamente frequência cardíaca, pressão arterial sistólica e diastólica, nível de saturação de oxigénio no sangue e temperatura corporal, caso em que o modelo atinge valores de 83% de precisão e 62% de sensibilidade. Se, para além das variáveis mencionadas, estiverem disponíveis análises laboratoriais de bilirrubina, creatinina, hemoglobina, leucócitos, contagem de plaquetas e níveis de proteína C-reativa, a sensibilidade da plataforma sobre para 77%. Concluiu-se que o nível de saturação de oxigénio no sangue é uma das variáveis mais importantes a ter em conta para o diagnóstico, em ambos os casos. A partir do momento que a plataforma venha a ser utilizada em situações clínicas reais, com o consequente aumento dos dados disponíveis, crê-se que o desempenho venha a ser ainda melhor

    A Physiology-Driven Computational Model for Post-Cardiac Arrest Outcome Prediction

    Full text link
    Patients resuscitated from cardiac arrest (CA) face a high risk of neurological disability and death, however pragmatic methods are lacking for accurate and reliable prognostication. The aim of this study was to build computational models to predict post-CA outcome by leveraging high-dimensional patient data available early after admission to the intensive care unit (ICU). We hypothesized that model performance could be enhanced by integrating physiological time series (PTS) data and by training machine learning (ML) classifiers. We compared three models integrating features extracted from the electronic health records (EHR) alone, features derived from PTS collected in the first 24hrs after ICU admission (PTS24), and models integrating PTS24 and EHR. Outcomes of interest were survival and neurological outcome at ICU discharge. Combined EHR-PTS24 models had higher discrimination (area under the receiver operating characteristic curve [AUC]) than models which used either EHR or PTS24 alone, for the prediction of survival (AUC 0.85, 0.80 and 0.68 respectively) and neurological outcome (0.87, 0.83 and 0.78). The best ML classifier achieved higher discrimination than the reference logistic regression model (APACHE III) for survival (AUC 0.85 vs 0.70) and neurological outcome prediction (AUC 0.87 vs 0.75). Feature analysis revealed previously unknown factors to be associated with post-CA recovery. Results attest to the effectiveness of ML models for post-CA predictive modeling and suggest that PTS recorded in very early phase after resuscitation encode short-term outcome probabilities.Comment: 51 pages, 7 figures, 4 supplementary figure
    corecore