1,442 research outputs found
Finding Important Terms for Patients in Their Electronic Health Records: A Learning-to-Rank Approach Using Expert Annotations
BACKGROUND: Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients\u27 notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care.
OBJECTIVE: We aimed to develop a supervised natural language processing (NLP) system called Finding impOrtant medical Concepts most Useful to patientS (FOCUS) that automatically identifies and ranks medical terms in EHR notes based on their importance to the patients.
METHODS: First, we built an expert-annotated corpus. For each EHR note, 2 physicians independently identified medical terms important to the patient. Using the physicians\u27 agreement as the gold standard, we developed and evaluated FOCUS. FOCUS first identifies candidate terms from each EHR note using MetaMap and then ranks the terms using a support vector machine-based learn-to-rank algorithm. We explored rich learning features, including distributed word representation, Unified Medical Language System semantic type, topic features, and features derived from consumer health vocabulary. We compared FOCUS with 2 strong baseline NLP systems.
RESULTS: Physicians annotated 90 EHR notes and identified a mean of 9 (SD 5) important terms per note. The Cohen\u27s kappa annotation agreement was .51. The 10-fold cross-validation results show that FOCUS achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.940 for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FOCUS for identifying important terms from EHR notes was 0.866 AUC-ROC. Both performance scores significantly exceeded the corresponding baseline system scores (P \u3c .001). Rich learning features contributed to FOCUS\u27s performance substantially.
CONCLUSIONS: FOCUS can automatically rank terms from EHR notes based on their importance to patients. It may help develop future interventions that improve quality of care
Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach
BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first.
OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms.
METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data.
RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P \u3c .001 for all measures and all conditions). Using a rich set of learning features contributed to ADS\u27s performance substantially.
CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS\u27s performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request
Extractive Summarization : Experimental work on nursing notes in Finnish
Natural Language Processing (NLP) is a subfield of artificial intelligence and linguistics that is
concerned with how a computer machine interacts with human language. With the increasing
computational power and the advancement in technologies, researchers have been successful at
proposing various NLP tasks that have already been implemented as real-world applications today.
Automated text summarization is one of the many tasks that has not yet completely matured
particularly in health sector. A success in this task would enable healthcare professionals to grasp
patient's history in a minimal time resulting in faster decisions required for better care.
Automatic text summarization is a process that helps shortening a large text without sacrificing
important information. This could be achieved by paraphrasing the content known as the abstractive
method or by concatenating relevant extracted sentences namely the extractive method. In general, this
process requires the conversion of text into numerical form and then a method is executed to identify
and extract relevant text.
This thesis is an attempt of exploring NLP techniques used in extractive text summarization
particularly in health domain. The work includes a comparison of basic summarizing models
implemented on a corpus of patient notes written by nurses in Finnish language. Concepts and
research studies required to understand the implementation have been documented along with the
description of the code.
A python-based project is structured to build a corpus and execute multiple summarizing models. For
this thesis, we observe the performance of two textual embeddings namely Term Frequency - Inverse
Document Frequency (TF-IDF) which is based on simple statistical measure and Word2Vec which is
based on neural networks. For both models, LexRank, an unsupervised stochastic graph-based
sentence scoring algorithm, is used for sentence extraction and a random selection method is used as a
baseline method for evaluation.
To evaluate and compare the performance of models, summaries of 15 patient care episodes of each
model were provided to two human beings for manual evaluations. According to the results of the
small sample dataset, we observe that both evaluators seem to agree with each other in preferring
summaries produced by Word2Vec LexRank over the summaries generated by TF-IDF LexRank.
Both models have also been observed, by both evaluators, to perform better than the baseline model of
random selection
Discharge Summary Hospital Course Summarisation of In Patient Electronic Health Record Text with Clinical Concept Guided Deep Pre-Trained Transformer Models
Brief Hospital Course (BHC) summaries are succinct summaries of an entire
hospital encounter, embedded within discharge summaries, written by senior
clinicians responsible for the overall care of a patient. Methods to
automatically produce summaries from inpatient documentation would be
invaluable in reducing clinician manual burden of summarising documents under
high time-pressure to admit and discharge patients. Automatically producing
these summaries from the inpatient course, is a complex, multi-document
summarisation task, as source notes are written from various perspectives (e.g.
nursing, doctor, radiology), during the course of the hospitalisation. We
demonstrate a range of methods for BHC summarisation demonstrating the
performance of deep learning summarisation models across extractive and
abstractive summarisation scenarios. We also test a novel ensemble extractive
and abstractive summarisation model that incorporates a medical concept
ontology (SNOMED) as a clinical guidance signal and shows superior performance
in 2 real-world clinical data sets
Development of Artificial Intelligence Algorithms for Early Diagnosis of Sepsis
Sepsis is a prevalent syndrome that manifests itself through an uncontrolled response
from the body to an infection, that may lead to organ dysfunction. Its diagnosis is urgent
since early treatment can reduce the patients’ chances of having long-term consequences.
Yet, there are many obstacles to achieving this early detection. Some stem from the
syndrome’s pathogenesis, which lacks a characteristic biomarker. The available clinical
detection tools are either too complex or lack sensitivity, in both cases delaying the diagnosis.
Another obstacle relates to modern technology, that when paired with the many
clinical parameters that are monitored to detect sepsis, result in extremely heterogenous
and complex medical records, which constitute a big obstacle for the responsible clinicians,
that are forced to analyse them to diagnose the syndrome.
To help achieve this early diagnosis, as well as understand which parameters are most
relevant to obtain it, an approach based on the use of Artificial Intelligence algorithms is
proposed in this work, with the model being implemented in the alert system of a sepsis
monitoring platform.
This platform uses a Random Forest algorithm, based on supervised machine learning
classification, that is capable of detecting the syndrome in two different scenarios. The
earliest detection can happen if there are only five vital sign parameters available for
measurement, namely heart rate, systolic and diastolic blood pressures, blood oxygen
saturation level, and body temperature, in which case, the model has a score of 83%
precision and 62% sensitivity. If besides the mentioned variables, laboratory analysis
measurements of bilirubin, creatinine, hemoglobin, leukocytes, platelet count, and Creactive
protein levels are available, the platform’s sensitivity increases to 77%. With this,
it has also been found that the blood oxygen saturation level is one of the most important
variables to take into account for the task, in both cases. Once the platform is tested
in real clinical situations, together with an increase in the available clinical data, it is
believed that the platform’s performance will be even better.A sépsis é uma síndrome com elevada incidência a nível global, que se manifesta através
de uma resposta desregulada por parte do organismo a uma infeção, podendo resultar
em disfunções orgânicas generalizadas. O diagnóstico da mesma é urgente, uma vez que
um tratamento precoce pode reduzir as hipóteses de consequências a longo prazo para
os doentes. Apesar desta necessidade, existem vários obstáculos. Alguns deles advêm
da patogenia da síndrome, que carece de um biomarcador específico. As ferramentas
de deteção clínica são demasiado complexas, ou pouco sensíveis, em ambos os casos
atrasando o diagnóstico. Outro obstáculo relaciona-se com os avanços da tecnologia, que,
com os vários parâmetros clínicos que são monitorizados, resulta em registos médicos
heterogéneos e complexos, o que constitui um grande obstáculo para os profissionais de
saúde, que se vêm forçados a analisá-los para diagnosticar a síndrome.
Para atingir este diagnóstico precoce, bem como compreender quais os parâmetros
mais relevantes para o alcançar, é proposta neste trabalho uma abordagem baseada num
algoritmo de Inteligência Artificial, sendo o modelo implementado no sistema de alerta
de uma plataforma de monitorização de sépsis.
Esta plataforma utiliza um classificador Random Forest baseado em aprendizagem automática
supervisionada, capaz de diagnosticar a síndrome de duas formas. Uma deteção
mais precoce pode ocorrer através de cinco parâmetros vitais, nomeadamente frequência
cardíaca, pressão arterial sistólica e diastólica, nível de saturação de oxigénio no sangue
e temperatura corporal, caso em que o modelo atinge valores de 83% de precisão e 62%
de sensibilidade. Se, para além das variáveis mencionadas, estiverem disponíveis análises
laboratoriais de bilirrubina, creatinina, hemoglobina, leucócitos, contagem de plaquetas
e níveis de proteína C-reativa, a sensibilidade da plataforma sobre para 77%. Concluiu-se
que o nível de saturação de oxigénio no sangue é uma das variáveis mais importantes a ter
em conta para o diagnóstico, em ambos os casos. A partir do momento que a plataforma
venha a ser utilizada em situações clínicas reais, com o consequente aumento dos dados
disponíveis, crê-se que o desempenho venha a ser ainda melhor
A Physiology-Driven Computational Model for Post-Cardiac Arrest Outcome Prediction
Patients resuscitated from cardiac arrest (CA) face a high risk of
neurological disability and death, however pragmatic methods are lacking for
accurate and reliable prognostication. The aim of this study was to build
computational models to predict post-CA outcome by leveraging high-dimensional
patient data available early after admission to the intensive care unit (ICU).
We hypothesized that model performance could be enhanced by integrating
physiological time series (PTS) data and by training machine learning (ML)
classifiers. We compared three models integrating features extracted from the
electronic health records (EHR) alone, features derived from PTS collected in
the first 24hrs after ICU admission (PTS24), and models integrating PTS24 and
EHR. Outcomes of interest were survival and neurological outcome at ICU
discharge. Combined EHR-PTS24 models had higher discrimination (area under the
receiver operating characteristic curve [AUC]) than models which used either
EHR or PTS24 alone, for the prediction of survival (AUC 0.85, 0.80 and 0.68
respectively) and neurological outcome (0.87, 0.83 and 0.78). The best ML
classifier achieved higher discrimination than the reference logistic
regression model (APACHE III) for survival (AUC 0.85 vs 0.70) and neurological
outcome prediction (AUC 0.87 vs 0.75). Feature analysis revealed previously
unknown factors to be associated with post-CA recovery. Results attest to the
effectiveness of ML models for post-CA predictive modeling and suggest that PTS
recorded in very early phase after resuscitation encode short-term outcome
probabilities.Comment: 51 pages, 7 figures, 4 supplementary figure
- …