11 research outputs found

    A cascade of classifiers for extracting medication information from discharge summaries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Extracting medication information from clinical records has many potential applications, and recently published research, systems, and competitions reflect an interest therein. Much of the early extraction work involved rules and lexicons, but more recently machine learning has been applied to the task.</p> <p>Methods</p> <p>We present a hybrid system consisting of two parts. The first part, field detection, uses a cascade of statistical classifiers to identify medication-related named entities. The second part uses simple heuristics to link those entities into medication events.</p> <p>Results</p> <p>The system achieved performance that is comparable to other approaches to the same task. This performance is further improved by adding features that reference external medication name lists.</p> <p>Conclusions</p> <p>This study demonstrates that our hybrid approach outperforms purely statistical or rule-based systems. The study also shows that a cascade of classifiers works better than a single classifier in extracting medication information. The system is available as is upon request from the first author.</p

    Validation of an algorithm to evaluate the appropriateness of outpatient antibiotic prescribing using big data of Chinese diagnosis text

    Get PDF
    OBJECTIVE: We aimed to evaluate the validity of an algorithm to classify diagnoses according to the appropriateness of outpatient antibiotic use in the context of Chinese free text. SETTING AND PARTICIPANTS: A random sample of 10 000 outpatient visits was selected between January and April 2018 from a national database for monitoring rational use of drugs, which included data from 194 secondary and tertiary hospitals in China. RESEARCH DESIGN: Diagnoses for outpatient visits were classified as tier 1 if associated with at least one condition that 'always' justified antibiotic use; as tier 2 if associated with at least one condition that only 'sometimes' justified antibiotic use but no conditions that 'always' justified antibiotic use; or as tier 3 if associated with only conditions that never justified antibiotic use, using a tier-fashion method and regular expression (RE)-based algorithm. MEASURES: Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the classification algorithm, using classification made by chart review as the standard reference, were calculated. RESULTS: The sensitivities of the algorithm for classifying tier 1, tier 2 and tier 3 diagnoses were 98.2% (95% CI 96.4% to 99.3%), 98.4% (95% CI 97.6% to 99.1%) and 100.0% (95% CI 100.0% to 100.0%), respectively. The specificities were 100.0% (95% CI 100.0% to 100.0%), 100.0% (95% CI 99.9% to 100.0%) and 98.6% (95% CI 97.9% to 99.1%), respectively. The PPVs for classifying tier 1, tier 2 and tier 3 diagnoses were 100.0% (95% CI 99.1% to 100.0%), 99.7% (95% CI 99.2% to 99.9%) and 99.7% (95% CI 99.6% to 99.8%), respectively. The NPVs were 99.9% (95% CI 99.8% to 100.0%), 99.8% (95% CI 99.7% to 99.9%) and 100.0% (95% CI 99.8% to 100.0%), respectively. CONCLUSIONS: The RE-based classification algorithm in the context of Chinese free text had sufficiently high validity for further evaluating the appropriateness of outpatient antibiotic prescribing

    Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts

    Get PDF
    Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks

    Titulación automática de preguntas en encuestas electorales

    Get PDF
    This paper describes the work carried out for automatically generating titles for questions included in the opinion polls contained in CIS databases (Centro de Investigaciones Sociológicas – Spanish Center of Sociological Research). In the context of CIS, the title of a question should meet two requirements: from the point of view of form, it has to be grammatically correct and similar in style to existing ones; from the point of view of content, it must contain the subject of the question and the different options for answering. These conditions for form and content of titles discourage the use of techniques used in similar problems, such as automatic abstracting or machine learning with a training corpus, but rather favor a methodology based on an analysis and knowledge of the domain. To illustrate the analysis and the resolution strategy of the problem, we have selected a set of questions related to elections, due to their strategic importance and to CIS’s own specialization in opinion polls. The process followed and the subsequent evaluation of results are discussed in detail, with an assessment of both qualitative and quantitative aspects. The evaluation shows that 88.73% of the generated titles are in strict accordance with CIS’s requisites on form and content, resulting in reduced time spent by the institution’s qualified personnel on manual work.Este artículo describe el trabajo realizado para la generación automática de los títulos de las preguntas pertenecientes a las encuestas de opinión que existen en las bases de datos del CIS (Centro de Investigaciones Sociológicas). Dentro del contexto del CIS, el título de una pregunta debe cumplir dos requisitos: desde el punto de vista de la forma, debe ser gramaticalmente correcto y tener un estilo similar a los ya existentes; y, desde el punto de vista del contenido, debe albergar el tema de la pregunta y las distintas categorías de respuesta. Estas restricciones en cuanto a la forma y al contenido de los títulos desaconsejan el uso de técnicas empleadas en problemas similares, como el resumen automático o aprendizaje automático con corpus de entrenamiento, a favor de una metodología basada en el análisis y conocimiento del dominio. Para ilustrar el análisis y la estrategia de resolución del problema seguidos, hemos seleccionado las preguntas relacionadas con temas electorales, debido a la importancia estratégica y a la especialización del CIS en este tipo de encuestas. Se describe en detalle el procedimiento seguido y la evaluación de los resultados, valorando tanto los aspectos cualitativos como los cuantitativos. La evaluación muestra que el 88,73% de los títulos generados cumplen estrictamente con los requisitos de forma y contenido impuestos por el CIS, lo que supone un ahorro en el trabajo manual del personal cualificado de la institución

    Cohort selection for clinical trials from longitudinal patient records: text mining approach

    Get PDF
    Background: Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process. Objective: The Track 1 of the 2018 National NLP Clinical Challenge (n2c2) focused on the task of cohort selection for clinical trials with the aim of answering the following question: 'Can natural language processing be applied to narrative medical records to identify patients who meet eligibility criteria for clinical trials?' The task required the participating systems to analyze longitudinal patient records to determine if the corresponding patients met the given eligibility criteria. This article describes a system developed to address this task. Methods: Our system consists of 13 classifiers, one for each eligibility criterion. All classifiers use a bag-of-words document representation model. To prevent the loss of relevant contextual information associated with such representation, a pattern matching approach is used to extract context-sensitive features. They are embedded back into the text as lexically distinguishable tokens, which will consequently be featured in the bag-of-words representation. Supervised machine learning was chosen wherever a sufficient number of both positive and negative instances were available to learn from. A rule–based approach focusing on a small set of relevant features was chosen for the remaining criteria. Results: The system was evaluated using micro-averaged F–measure. Four machine algorithms, including support vector machine, logistic regression, naïve Bayesian classifier and gradient tree boosting, were evaluated on the training data using 10–fold cross-validation. Overall, gradient tree boosting demonstrated the most consistent performance. Its performance peaked when oversampling was used to balance the training data. Final evaluation was performed on previously unseen test data. On average, the F-measure of 89.04% was comparable to three of the top ranked performances in the shared task (91.11%, 90.28% and 90.21%). With F-measure of 88.14%, we significantly outperformed these systems (81.03%, 78.50% and 70.81%) in identifying patients with advanced coronary artery disease. Conclusions: The holdout evaluation provides evidence that our system was able to identify eligible patients for the given clinical trial with high accuracy. Our approach demonstrates how rule-based knowledge infusion can improve the performance of machine learning algorithms even when trained on a relatively small dataset

    FlexiTerm: a flexible term recognition method

    Full text link

    Medi-Matcher: Matching von Medikamentennamen in Form von Freitexteingaben im Kontext von LIFE

    Get PDF
    In vielen Krankenhäusern, Apotheken, Arztpraxen und Forschungsinstituten werden Informationen zu Medikamenten mithilfe verschiedener Software digital verwaltet. In Deutschland existiert bisher keine einheitliche Norm für Medikamentennamen. Stattdessen wird eine Vielzahl an Synonymen und Beschreibungen für ein einziges Medikament verwendet. Erschwerend kommt hinzu, dass jede medizinische Einrichtung autonom über die, zur Medikamentenverwaltung verwendete Software entscheidet. Dadurch ist die Zusammenführung von Daten aus verschiedenen Systemen schwierig. Um die Datenintegration und den Datenaustausch zu verbessern, muss die Heterogenität der Daten reduziert werden. In Deutschland existieren verschiedene Arzneimittelverzeichnisse wie die Rote Liste [Gmb14], die Gelbe Liste [Med15], der GKV-Arzneimittelindex [dAW15, dAW13] oder Pharm-Net [fMDuID14]. Es wird aber auch auf einige europäische Medikamentendatenbanken wie den Human Mutual Recognition Index [fAuM14] oder den European Medicines Agency [Age15] und sogar amerikanische Standards wie die RxNorm [NZK+11] zurückgegriffen. Immer populärer wird auch die Nutzung von Websites wie mediguard.org um Informationen zu Medikamenten abzurufen. Einen einheitlichen Standard gibt es jedoch nicht. Dies hat zur Folge, dass im Kontext von Studien und Dokumentationen zur Medikamenteneinnahme in Arztpraxen und Apotheken große Mengen an Informationen aufgenommen werden, die jedoch untereinander nicht kompatibel sind. Um mehrere dieser Datenquellen gewinnbringend nutzen zu können, ist es erforderlich die verschiedenen Synonyme der Medikamentennamen abzugleichen und sich auf eine Norm festzulegen. In dieser Arbeit wird auf die Probleme beim Daten-Management in heterogenen Netzwerken eingegangen Deshalb wird: - eine Situationsanalyse für das Daten-Management vorgenommen, - der Datenfluss beim Daten-Management in heterogenen Netzwerken analysiert, - die Aufgaben eines Daten-Management-Systems bestimmt, - benutzbare Standards zu vergleichen, - alternative Verfahren zu diskutieren, - eine Konzeption für ein Daten-Management-System entwickelt, - die Konzetion in ein lauffähiges Programm umzusetzen und - dieses an Beispielen zu erproben. Ein weiteres Augenmerk liegt auf der Untersuchung der Eignung der plattformunabhängigen Programmiersprache JAVA für den Einsatz in größeren Software-Projekten

    Medication information extraction with linguistic pattern matching and semantic rules

    No full text
    10.1136/jamia.2010.003657 Objective This study presents a system developed for the 2009 i2b2 Challenge in Natural Language Processing for Clinical Data, whose aim was to automatically extract certain information about medications used by a patient from his/her medical report. The aim was to extract the following information for each medication: name, dosage, mode/route, frequency, duration and reason.Design The system implements a rule-based methodology, which exploits typical morphological, lexical, syntactic and semantic features of the targeted information. These features were acquired from the training dataset and public resources such as the UMLS and relevant web pages. Information extracted by pattern matching was combined together using context-sensitive heuristic rules.Measurements The system was applied to a set of 547 previously unseen discharge summaries, and the extracted information was evaluated against a manually prepared gold standard consisting of 251 documents. The overall ranking of the participating teams was obtained using the micro-averaged F-measure as the primary evaluation metric.Results The implemented method achieved the micro-averaged F-measure of 81% (with 86% precision and 77% recall), which ranked this system third in the challenge. The significance tests revealed the system's performance to be not significantly different from that of the second ranked system. Relative to other systems, this system achieved the best F-measure for the extraction of duration (53%) and reason (46%).Conclusion Based on the F-measure, the performance achieved (81%) was in line with the initial agreement between human annotators (82%), indicating that such a system may greatly facilitate the process of extracting relevant information from medical records by providing a solid basis for a manual review process

    Adverse Drug Event Detection, Causality Inference, Patient Communication and Translational Research

    Get PDF
    Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the FDA Adverse Event Reporting System reports. The ADEtector system employs novel natural language processing approaches for ADE detection and provides a user interface to display ADE information. The ADEtector employs machine learning techniques to automatically processes the narrative text and identify the adverse event (AE) and medication entities that appear in that narrative text. The system will analyze the entities recognized to infer the causal relation that exists between AEs and medications by automating the elements of Naranjo score using knowledge and rule based approaches. The Naranjo Adverse Drug Reaction Probability Scale is a validated tool for finding the causality of a drug induced adverse event or ADE. The scale calculates the likelihood of an adverse event related to drugs based on a list of weighted questions. The ADEtector also presents the user with evidence for ADEs by extracting figures that contain ADE related information from biomedical literature. A brief summary is generated for each of the figures that are extracted to help users better comprehend the figure. This will further enhance the user experience in understanding the ADE information better. The ADEtector also helps patients better understand the narrative text by recognizing complex medical jargon and abbreviations that appear in the text and providing definitions and explanations for them from external knowledge resources. This system could help clinicians and researchers in discovering novel ADEs and drug relations and also hypothesize new research questions within the ADE domain
    corecore