238 research outputs found
Ontology-Based Clinical Information Extraction Using SNOMED CT
Extracting and encoding clinical information captured in unstructured clinical documents with standard medical terminologies is vital to enable secondary use of clinical data from practice. SNOMED CT is the most comprehensive medical ontology with broad types of concepts and detailed relationships and it has been widely used for many clinical applications. However, few studies have investigated the use of SNOMED CT in clinical information extraction.
In this dissertation research, we developed a fine-grained information model based on the SNOMED CT and built novel information extraction systems to recognize clinical entities and identify their relations, as well as to encode them to SNOMED CT concepts. Our evaluation shows that such ontology-based information extraction systems using SNOMED CT could achieve state-of-the-art performance, indicating its potential in clinical natural language processing
GNTeam at 2018 n2c2:Feature-augmented BiLSTM-CRF for drug-related entity recognition in hospital discharge summaries
Monitoring the administration of drugs and adverse drug reactions are key
parts of pharmacovigilance. In this paper, we explore the extraction of drug
mentions and drug-related information (reason for taking a drug, route,
frequency, dosage, strength, form, duration, and adverse events) from hospital
discharge summaries through deep learning that relies on various
representations for clinical named entity recognition. This work was officially
part of the 2018 n2c2 shared task, and we use the data supplied as part of the
task. We developed two deep learning architecture based on recurrent neural
networks and pre-trained language models. We also explore the effect of
augmenting word representations with semantic features for clinical named
entity recognition. Our feature-augmented BiLSTM-CRF model performed with
F1-score of 92.67% and ranked 4th for entity extraction sub-task among
submitted systems to n2c2 challenge. The recurrent neural networks that use the
pre-trained domain-specific word embeddings and a CRF layer for label
optimization perform drug, adverse event and related entities extraction with
micro-averaged F1-score of over 91%. The augmentation of word vectors with
semantic features extracted using available clinical NLP toolkits can further
improve the performance. Word embeddings that are pre-trained on a large
unannotated corpus of relevant documents and further fine-tuned to the task
perform rather well. However, the augmentation of word embeddings with semantic
features can help improve the performance (primarily by boosting precision) of
drug-related named entity recognition from electronic health records
Mining social media data for biomedical signals and health-related behavior
Social media data has been increasingly used to study biomedical and
health-related phenomena. From cohort level discussions of a condition to
planetary level analyses of sentiment, social media has provided scientists
with unprecedented amounts of data to study human behavior and response
associated with a variety of health conditions and medical treatments. Here we
review recent work in mining social media for biomedical, epidemiological, and
social phenomena information relevant to the multilevel complexity of human
health. We pay particular attention to topics where social media data analysis
has shown the most progress, including pharmacovigilance, sentiment analysis
especially for mental health, and other areas. We also discuss a variety of
innovative uses of social media data for health-related applications and
important limitations in social media data access and use.Comment: To appear in the Annual Review of Biomedical Data Scienc
Extracting clinical knowledge from electronic medical records
As the adoption of Electronic Medical Records (EMRs) rises in the healthcare
institutions, these resourcesâ importance increases due to all clinical information they
contain about patients. However, the unstructured information in the form of clinical
narratives present in these records makes it hard to extract and structure useful clinical
knowledge. This unstructured information limits the potential of the EMRs because the
clinical information these records contain can be used to perform essential tasks inside
healthcare institutions such as searching, summarization, decision support and statistical
analysis, as well as be used to support management decisions or serve for research. These
tasks can only be done if the unstructured clinical information from the narratives is
appropriately extracted, structured and processed in clinical knowledge. Usually, this
information extraction and structuration in clinical knowledge is performed manually by
healthcare practitioners, which is not efficient and is error-prone. This research aims to
propose a solution to this problem, by using Machine Translation (MT) from the
Portuguese language to the English language, Natural Language Processing (NLP) and
Information Extraction (IE) techniques. With the help of these techniques, the goal is to
develop a prototype pipeline modular system that can extract clinical knowledge from
unstructured clinical information contained in Portuguese EMRs, in an automated way,
in order to help EMRs to fulfil their potential and consequently help the Portuguese
hospital involved in this research. This research also intends to show that this generic
prototype system and approach can potentially be applied to other hospitals, even if they
donât use the Portuguese language.Com a adopção cada vez maior das instituiçÔes de saĂșde face aos Processos ClĂnicos
ElectrĂłnicos (PCE), estes documentos ganham cada vez mais importĂąncia em contexto
clĂnico, devido a toda a informação clĂnica que contĂȘm relativamente aos pacientes. No
entanto, a informação nĂŁo estruturada na forma de narrativas clĂnicas presente nestes
documentos electrĂłnicos, faz com que seja difĂcil extrair e estruturar deles conhecimento
clĂnico. Esta informação nĂŁo estruturada limita o potencial dos PCE, uma vez que essa
mesma informação, caso seja extraĂda e estruturada devidamente, pode servir para que as
instituiçÔes de saĂșde possam efectuar actividades importantes com maior eficiĂȘncia e
sucesso, como por exemplo actividades de pesquisa, sumarização, apoio à decisão,
anĂĄlises estatĂsticas, suporte a decisĂ”es de gestĂŁo e de investigação. Este tipo de
actividades apenas podem ser feitas com sucesso caso a informação clĂnica nĂŁo
estruturada presente nos PCE seja devidamente extraĂda, estruturada e processada em
conhecimento clĂnico. Habitualmente, esta extração Ă© realizada manualmente pelos
profissionais mĂ©dicos, o que nĂŁo Ă© eficiente e Ă© susceptĂvel a erros. Esta dissertação
pretende então propÎr uma solução para este problema, ao utilizar técnicas de Tradução
AutomĂĄtica (TA) da lĂngua portuguesa para a lĂngua inglesa, Processamento de
Linguagem Natural (PLN) e Extração de Informação (EI). O objectivo é desenvolver um
sistema protótipo de módulos em série que utilize estas técnicas, possibilitando a extração
de conhecimento clĂnico, de uma forma automĂĄtica, de informação clĂnica nĂŁo estruturada
presente nos PCE de um hospital portuguĂȘs. O principal objetivo Ă© ajudar os PCE a
atingirem todo o seu potencial em termos de conhecimento clĂnico que contĂȘm e
consequentemente ajudar o hospital portuguĂȘs em questĂŁo envolvido nesta dissertação,
demonstrando também que este sistema protótipo e esta abordagem podem
potencialmente ser aplicados a outros hospitais, mesmo que nĂŁo sejam de lĂngua
portuguesa
Performance and error analysis of three part of speech taggers on health texts
Increasingly, natural language processing (NLP) techniques are being developed and utilized in a variety of biomedical domains. Part of speech tagging is a critical step in many NLP applications. Currently, we are developing a NLP tool for text simplification. As part of this effort, we set off to evaluate several part of speech (POS) taggers. We selected 120 sentences (2375 tokens) from a corpus of six types of diabetes-related health texts and asked human reviewers to tag each word in these sentences to create a "Gold Standard." We then tested each of the three POS taggers against the "Gold Standard." One tagger (dTagger) had been trained on health texts and the other two (MaxEnt and Curran & Clark) were trained on general news articles. We analyzed the errors and placed them into five categories: systematic, close, subtle, difficult source, and other. The three taggers have relatively similar rates of success: dTagger, MaxEnt, and Curran & Clark had 87%, 89% and 90% agreement with the gold standard, respectively. These rates of success are lower than published rates for these taggers. This is probably due to our testing them on a corpus that differs significantly from their training corpora. The taggers made different errors: the dTagger, which had been trained on a set of medical texts (MedPost), made fewer errors on medical terms than MaxEnt and Curran & Clark. The latter two taggers performed better on non-medical terms and we found the difference between their performance and that of dTagger was statistically significant. Our findings suggest that the three POS taggers have similar correct tagging rates, though they differ in the types of errors they make. For the task of text simplification, we are inclined to perform additional training of the Curran & Clark tagger with the Medpost corpus because both the fine grained tagging provided by this tool and the correct recognition of medical terms are equally important
A Review on Adverse Drug Reaction Detection Techniques
The detection of adverse drug reactions (ADRs) is an important piece of information for determining a patientâs view of a single drug. This study attempts to consider and discuss this feature of drug reviews in medical opinion-mining systems. This paper discusses the literature that summarizes the background of this work. To achieve this aim, the first discusses a survey on detecting ADRs and side effects, followed by an examination of biomedical text mining that focuses on identifying the specific relationships involving ADRs. Finally, we will provide a general overview of sentiment analysis, particularly from a medical perspective. This study presents a survey on ADRs extracted from drug review sentences on social media, utilizing and comparing different techniques
Utilizing Consumer Health Posts for Pharmacovigilance: Identifying Underlying Factors Associated with Patientsâ Attitudes Towards Antidepressants
Non-adherence to antidepressants is a major obstacle to antidepressants therapeutic benefits, resulting in increased risk of relapse, emergency visits, and significant burden on individuals and the healthcare system. Several studies showed that non-adherence is weakly associated with personal and clinical variables, but strongly associated with patientsâ beliefs and attitudes towards medications. The traditional methods for identifying the key dimensions of patientsâ attitudes towards antidepressants are associated with some methodological limitations, such as concern about confidentiality of personal information. In this study, attempts have been made to address the limitations by utilizing patientsâ self report experiences in online healthcare forums to identify underlying factors affecting patients attitudes towards antidepressants. The data source of the study was a healthcare forum called âaskapatients.comâ. 892 patientsâ reviews were randomly collected from the forum for the four most commonly prescribed antidepressants including Sertraline (Zoloft) and Escitalopram (Lexapro) from SSRI class, and Venlafaxine (Effexor) and duloxetine (Cymbalta) from SNRI class. Methodology of this study is composed of two main phases: I) generating structured data from unstructured patientsâ drug reviews and testing hypotheses concerning attitude, II) identification and normalization of Adverse Drug Reactions (ADRs), Withdrawal Symptoms (WDs) and Drug Indications (DIs) from the posts, and mapping them to both The UMLS and SNOMED CT concepts. Phase II also includes testing the association between ADRs and attitude. The result of the first phase of this study showed that âexperience of adverse drug reactionsâ, âperceived distress received from ADRsâ, âlack of knowledge about medicationâs mechanismâ, âwithdrawal experienceâ, âduration of usageâ, and âdrug effectivenessâ are strongly associated with patients attitudes. However, demographic variables including âageâ and âgenderâ are not associated with attitude. Analysis of the data in second phase of the study showed that from 6,534 identified entities, 73% are ADRs, 12% are WDs, and 15 % are drug indications. In addition, psychological and cognitive expressions have higher variability than physiological expressions. All three types of entities were mapped to 811 UMLS and SNOMED CT concepts. Testing the association between ADRs and attitude showed that from twenty-one physiological ADRs specified in the ASEC questionnaire, âdry mouthâ, âincreased appetiteâ, âdisorientationâ, âyawningâ, âweight gainâ, and âproblem with sexual dysfunctionâ are associated with attitude. A set of psychological and cognitive ADRs, such as âemotional indifferenceâ and âmemory problem were also tested that showed significance association between these types of ADRs and attitude. The findings of this study have important implications for designing clinical interventions aiming to improve patients\u27 adherence towards antidepressants. In addition, the dataset generated in this study has significant implications for improving performance of text-mining algorithms aiming to identify health related information from consumer health posts. Moreover, the dataset can be used for generating and testing hypotheses related to ADRs associated with psychiatric mediations, and identifying factors associated with discontinuation of antidepressants. The dataset and guidelines of this study are available at https://sites.google.com/view/pharmacovigilanceinpsychiatry/hom
- âŠ