10 research outputs found

    A cascade of classifiers for extracting medication information from discharge summaries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Extracting medication information from clinical records has many potential applications, and recently published research, systems, and competitions reflect an interest therein. Much of the early extraction work involved rules and lexicons, but more recently machine learning has been applied to the task.</p> <p>Methods</p> <p>We present a hybrid system consisting of two parts. The first part, field detection, uses a cascade of statistical classifiers to identify medication-related named entities. The second part uses simple heuristics to link those entities into medication events.</p> <p>Results</p> <p>The system achieved performance that is comparable to other approaches to the same task. This performance is further improved by adding features that reference external medication name lists.</p> <p>Conclusions</p> <p>This study demonstrates that our hybrid approach outperforms purely statistical or rule-based systems. The study also shows that a cascade of classifiers works better than a single classifier in extracting medication information. The system is available as is upon request from the first author.</p

    Assessing Information Congruence of Documented Cardiovascular Disease between Electronic Dental and Medical Records

    Get PDF
    Dentists are more often treating patients with Cardiovascular Diseases (CVD) in their clinics; therefore, dentists may need to alter treatment plans in the presence of CVD. However, it’s unclear to what extent patient-reported CVD information is accurately captured in Electronic Dental Records (EDRs). In this pilot study, we aimed to measure the reliability of patient-reported CVD conditions in EDRs. We assessed information congruence by comparing patients’ self-reported dental histories to their original diagnosis assigned by their medical providers in the Electronic Medical Record (EMR). To enable this comparison, we encoded patients CVD information from the free-text data of EDRs into a structured format using natural language processing (NLP). Overall, our NLP approach achieved promising performance extracting patients’ CVD-related information. We observed disagreement between self-reported EDR data and physician-diagnosed EMR data

    Arabic medical entity tagging using distant learning in a Multilingual Framework

    Get PDF
    AbstractA semantic tagger aiming to detect relevant entities in Arabic medical documents and tagging them with their appropriate semantic class is presented. The system takes profit of a Multilingual Framework covering four languages (Arabic, English, French, and Spanish), in a way that resources available for each language can be used to improve the results of the others, this is specially important for less resourced languages as Arabic. The approach has been evaluated against Wikipedia pages of the four languages belonging to the medical domain. The core of the system is the definition of a base tagset consisting of the three most represented classes in SNOMED-CT taxonomy and the learning of a binary classifier for each semantic category in the tagset and each language, using a distant learning approach over three widely used knowledge resources, namely Wikipedia, Dbpedia, and SNOMED-CT

    A sentence classification framework to identify geometric errors in radiation therapy from relevant literature

    Get PDF
    The objective of systematic reviews is to address a research question by summarizing relevant studies following a detailed, comprehensive, and transparent plan and search protocol to reduce bias. Systematic reviews are very useful in the biomedical and healthcare domain; however, the data extraction phase of the systematic review process necessitates substantive expertise and is labour-intensive and time-consuming. The aim of this work is to partially automate the process of building systematic radiotherapy treatment literature reviews by summarizing the required data elements of geometric errors of radiotherapy from relevant literature using machine learning and natural language processing (NLP) approaches. A framework is developed in this study that initially builds a training corpus by extracting sentences containing different types of geometric errors of radiotherapy from relevant publications. The publications are retrieved from PubMed following a given set of rules defined by a domain expert. Subsequently, the method develops a training corpus by extracting relevant sentences using a sentence similarity measure. A support vector machine (SVM) classifier is then trained on this training corpus to extract the sentences from new publications which contain relevant geometric errors. To demonstrate the proposed approach, we have used 60 publications containing geometric errors in radiotherapy to automatically extract the sentences stating the mean and standard deviation of different types of errors between planned and executed radiotherapy. The experimental results show that the recall and precision of the proposed framework are, respectively, 97% and 72%. The results clearly show that the framework is able to extract almost all sentences containing required data of geometric errors

    Recognition of medication information from discharge summaries using ensembles of classifiers

    Get PDF
    BACKGROUND: Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks. METHODS: We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting. RESULTS: Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge. CONCLUSIONS: Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition

    A medication extraction framework for electronic health records

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 71-76).This thesis addresses the problem of concept and relation extraction in medical documents. We present a medical concept and relation extraction system (medNERR) that incorporates hand-built rules and constrained conditional models. We focus on two concept types (i.e., medications and medical conditions) and the pairwise administered-for relation between these two concepts. For medication extraction, we design a rule-based baseline medNERRgreedy med that identifies medications using the UMLS dictionary. We enhance medNERRgreedy med with information from topic models and additional corpus-derived heuristics, and show that the final medication extraction system outperforms the baseline and improves on state-of-the-art systems. For medical conditions extraction we design a Hidden Markov Model with conditional constraints. The conditional constraints frame world knowledge into a probabilistic model and help support model decisions. We approach relation extraction as a sequence labeling task, where we label the context between the medications and the medical concepts that are involved in an administered-for relation. We use a Hidden Markov Model with conditional constraints for labeling the relation context. We show that the relation extraction system outperforms current state of the art systems and that its main advantage comes from the incorporation of domain knowledge through conditional constraints. We compare our sequence labeling approach for relation extraction to a classification approach and show that our approach improves final system performance.by Andreea Bodnari.S.M

    Automatic Population of Structured Reports from Narrative Pathology Reports

    Get PDF
    There are a number of advantages for the use of structured pathology reports: they can ensure the accuracy and completeness of pathology reporting; it is easier for the referring doctors to glean pertinent information from them. The goal of this thesis is to extract pertinent information from free-text pathology reports and automatically populate structured reports for cancer diseases and identify the commonalities and differences in processing principles to obtain maximum accuracy. Three pathology corpora were annotated with entities and relationships between the entities in this study, namely the melanoma corpus, the colorectal cancer corpus and the lymphoma corpus. A supervised machine-learning based-approach, utilising conditional random fields learners, was developed to recognise medical entities from the corpora. By feature engineering, the best feature configurations were attained, which boosted the F-scores significantly from 4.2% to 6.8% on the training sets. Without proper negation and uncertainty detection, the quality of the structured reports will be diminished. The negation and uncertainty detection modules were built to handle this problem. The modules obtained overall F-scores ranging from 76.6% to 91.0% on the test sets. A relation extraction system was presented to extract four relations from the lymphoma corpus. The system achieved very good performance on the training set, with 100% F-score obtained by the rule-based module and 97.2% F-score attained by the support vector machines classifier. Rule-based approaches were used to generate the structured outputs and populate them to predefined templates. The rule-based system attained over 97% F-scores on the training sets. A pipeline system was implemented with an assembly of all the components described above. It achieved promising results in the end-to-end evaluations, with 86.5%, 84.2% and 78.9% F-scores on the melanoma, colorectal cancer and lymphoma test sets respectively
    corecore