5 research outputs found

    A system for de-identifying medical message board text

    Get PDF
    There are millions of public posts to medical message boards by users seeking support and information on a wide range of medical conditions. It has been shown that these posts can be used to gain a greater understanding of patients’ experiences and concerns. As investigators continue to explore large corpora of medical discussion board data for research purposes, protecting the privacy of the members of these online communities becomes an important challenge that needs to be met. Extant entity recognition methods used for more structured text are not sufficient because message posts present additional challenges: the posts contain many typographical errors, larger variety of possible names, terms and abbreviations specific to Internet posts or a particular message board, and mentions of the authors’ personal lives. The main contribution of this paper is a system to de-identify the authors of message board posts automatically, taking into account the aforementioned challenges. We demonstrate our system on two different message board corpora, one on breast cancer and another on arthritis. We show that our approach significantly outperforms other publicly available named entity recognition and de-identification systems, which have been tuned for more structured text like operative reports, pathology reports, discharge summaries, or newswire

    Identifying Potential Adverse Effects Using the Web: A New Approach to Medical Hypothesis Generation

    Get PDF
    Medical message boards are online resources where users with a particular condition exchange information, some of which they might not otherwise share with medical providers. Many of these boards contain a large number of posts and contain patient opinions and experiences that would be potentially useful to clinicians and researchers. We present an approach that is able to collect a corpus of medical message board posts, de-identify the corpus, and extract information on potential adverse drug effects discussed by users. Using a corpus of posts to breast cancer message boards, we identified drug event pairs using co-occurrence statistics. We then compared the identified drug event pairs with adverse effects listed on the package labels of tamoxifen, anastrozole, exemestane, and letrozole. Of the pairs identified by our system, 75–80% were documented on the drug labels. Some of the undocumented pairs may represent previously unidentified adverse drug effects

    Utilizing Consumer Health Posts for Pharmacovigilance: Identifying Underlying Factors Associated with Patients’ Attitudes Towards Antidepressants

    Get PDF
    Non-adherence to antidepressants is a major obstacle to antidepressants therapeutic benefits, resulting in increased risk of relapse, emergency visits, and significant burden on individuals and the healthcare system. Several studies showed that non-adherence is weakly associated with personal and clinical variables, but strongly associated with patients’ beliefs and attitudes towards medications. The traditional methods for identifying the key dimensions of patients’ attitudes towards antidepressants are associated with some methodological limitations, such as concern about confidentiality of personal information. In this study, attempts have been made to address the limitations by utilizing patients’ self report experiences in online healthcare forums to identify underlying factors affecting patients attitudes towards antidepressants. The data source of the study was a healthcare forum called “askapatients.com”. 892 patients’ reviews were randomly collected from the forum for the four most commonly prescribed antidepressants including Sertraline (Zoloft) and Escitalopram (Lexapro) from SSRI class, and Venlafaxine (Effexor) and duloxetine (Cymbalta) from SNRI class. Methodology of this study is composed of two main phases: I) generating structured data from unstructured patients’ drug reviews and testing hypotheses concerning attitude, II) identification and normalization of Adverse Drug Reactions (ADRs), Withdrawal Symptoms (WDs) and Drug Indications (DIs) from the posts, and mapping them to both The UMLS and SNOMED CT concepts. Phase II also includes testing the association between ADRs and attitude. The result of the first phase of this study showed that “experience of adverse drug reactions”, “perceived distress received from ADRs”, “lack of knowledge about medication’s mechanism”, “withdrawal experience”, “duration of usage”, and “drug effectiveness” are strongly associated with patients attitudes. However, demographic variables including “age” and “gender” are not associated with attitude. Analysis of the data in second phase of the study showed that from 6,534 identified entities, 73% are ADRs, 12% are WDs, and 15 % are drug indications. In addition, psychological and cognitive expressions have higher variability than physiological expressions. All three types of entities were mapped to 811 UMLS and SNOMED CT concepts. Testing the association between ADRs and attitude showed that from twenty-one physiological ADRs specified in the ASEC questionnaire, “dry mouth”, “increased appetite”, “disorientation”, “yawning”, “weight gain”, and “problem with sexual dysfunction” are associated with attitude. A set of psychological and cognitive ADRs, such as “emotional indifference” and “memory problem were also tested that showed significance association between these types of ADRs and attitude. The findings of this study have important implications for designing clinical interventions aiming to improve patients\u27 adherence towards antidepressants. In addition, the dataset generated in this study has significant implications for improving performance of text-mining algorithms aiming to identify health related information from consumer health posts. Moreover, the dataset can be used for generating and testing hypotheses related to ADRs associated with psychiatric mediations, and identifying factors associated with discontinuation of antidepressants. The dataset and guidelines of this study are available at https://sites.google.com/view/pharmacovigilanceinpsychiatry/hom

    Text Mining and Medicine: An approach to early detection of diseases

    Get PDF
    El futuro próximo de los servicios sanitarios vendrá marcado por el envejecimiento de la población y la cronicidad de las enfermedades. Junto a los cambios demográficos y sociales, se está produciendo un claro aumento de la frecuentación en los distintos servicios de atención primaria y especializada y, por supuesto, todo esto se traduce en un fuerte incremento del gasto sanitario. Todo este problemático contexto hace que las instituciones sanitarias se marquen como principales objetivos la priorización de la prevención, el control de los factores de riesgo y la detección precoz de enfermedades. Para apoyar la prevención primaria es muy importante que el profesional sanitario tenga todos los medios disponibles a su alcance para extraer conocimiento de su principal fuente de información que es la historia clínica informatizada del paciente. Así, el profesional sanitario debería disponer de herramientas que permitan conocer e interrelacionar eventos clínicos de interés, alertar sobre la aparición de futuros riesgos para la salud o pronosticar el posible desarrollo de una enfermedad. Sin embargo, el esfuerzo, tiempo y coste que supondría extraer este conocimiento de la simple lectura de los múltiples informes clínicos contenidos en la historia de un paciente (escritos en su mayoría en lenguaje natural), sería incalculable e imposible de asumir por la mayoría de los profesionales sanitarios en la clínica diaria. Hasta el momento, los sistemas de información existentes en la mayoría de instituciones sanitarias sólo han sido utilizados como sistemas de almacenaje de información, es decir sistemas que recopilan y almacenan toda la información asistencial generada en la interacción médico-paciente, pero todavía no se ha dado el paso de convertir estos grandes “almacenes de información” en “fuentes de conocimiento” que aporten valor para facilitar y apoyar la toma de decisiones clínicas. Sin embargo, el reto de automatizar este proceso, transformar almacenes de información en fuentes de conocimiento, no es una tarea trivial. Se estima que en un complejo hospitalario regional se pueden generar al año más de 3 millones de documentos clínicos, el 80% de esta documentación clínica contiene información no estructurada, una de la más destacable es la información textual. Hasta ahora la información clínica textual ha sido prácticamente ignorada por la mayoría de las instituciones sanitarias debido a la gran complejidad en su explotación para generar valor de su contenido. La principal fuente de conocimiento contenida en la historia clínica electrónica, que es la narrativa clínica textual, es en la práctica altamente desaprovechada. A la dificultad de las organizaciones sanitarias para obtener valor del texto, con las herramientas de análisis hasta ahora utilizadas, se suman las peculiares características que posee la terminología clínica donde prima: una alta ambigüedad y complejidad del vocabulario, la narrativa textual libre, una escasa normalización terminológica y un uso excesivo de acrónimos y negaciones. En este complejo marco y ante la creciente necesidad de adquirir conocimiento para apoyar el proceso de prevención y toma de decisiones clínicas, se hace imprescindible el uso de Sistemas Inteligentes que ayuden a extraer el valor encerrado en el contenido textual de los múltiples documentos que integran la historia clínica electrónica. Pero a pesar de esta acuciante necesidad, actualmente existen muy pocos sistemas reales que extraigan conocimiento del texto clínico para facilitar el trabajo diario al profesional sanitario en tareas arduas y complejas como la detección de factores de riesgo o la predicción diagnóstica. En la actualidad, para abordar la problemática de extraer valor del texto clínico, en el entorno de la medicina computacional, disponemos de las técnicas avanzadas que nos proporciona la disciplina de la Minería de Textos (MT). Esta disciplina puede definirse como un área orientada a la identificación y extracción de nuevo conocimiento adquirido a partir de información textual, es un campo multidisciplinar que puede integrar técnicas de otras disciplinas como el Procesamiento del Lenguaje Natural (PLN) o Aprendizaje Automático (AA). En este sentido, abordamos esta tesis doctoral con un análisis exhaustivo y pormenorizado del estado del arte sobre la disciplina de la MT en el ámbito de la Medicina, recogiendo los métodos, técnicas, tareas, recursos y tendencias más destacadas en la literatura. De esta amplia revisión se detecta que en la práctica los sistemas existentes para apoyar el proceso de toma de decisiones clínicas basados en información clínica textual son escasos y generalmente resuelven una única tarea principal centrándose en un área específica de conocimiento y siendo desarrollados para dominios muy específicos difícilmente reproducibles en otros entornos. Ante las problemáticas observadas en los sistemas de MT existentes y las necesidades de las instituciones sanitarias, se propone la creación de un novedoso sistema, denominado MiNerDoc, que permita apoyar la toma de decisiones clínicas en base a una combinación de técnicas de la disciplina de la MT, junto con el enriquecimiento terminológico y semántico proporcionado por la herramienta MetaMap y el metathesaurus UMLS, recursos que aportan características esenciales en el dominio médico. MiNerDoc permite, entre otras funcionalidades, detectar factores de riesgo o eventos clínicos de interés e inferir automáticamente códigos normalizados de diagnósticos tomando como fuente exclusiva la información textual contenida en informes clínicos, en definitiva, permite llevar a cabo tareas complejas que facilitan y apoyan la labor del profesional sanitario en la prevención primaria y la toma de decisiones clínicas. El sistema de MT propuesto ha sido evaluado en base a un amplio análisis experimental, los resultados demostraron la efectividad y viabilidad del sistema propuesto y verificaron el prometedor rendimiento de MiNerDoc en las dos tareas evaluadas, reconocimiento de entidades médicas y clasificación diagnóstica multietiqueta.The near future of health services will be marked by the ageing of the population and the chronicity of diseases. Together with the demographic and social changes, there is a clear increase in the number of people attending both primary and specialized care services, and, of course, all this produces a sharp increase in healthcare expenditure. All this context makes health institutions to set a series of main objectives: prioritization of prevention, control of risk factors and early detection of diseases. To support primary prevention, it is important that health professionals have all the available means at their disposal to extract knowledge from main sources of information, that is, the patient’s electronic health records. Thus, health professionals should have tools that allow them to know and interrelate clinical events of interest, receive alerts about upcoming health risks or predict the development of a disease. However, the effort, time and cost required to extract this knowledge by just reading of the multiple clinical reports belonging to a patient's history (mostly written in natural language), are incalculable and hardly affordable for most health professionals in the daily clinic practice. Until now, the existing information systems in most health institutions have only been used as information storage systems, that is, systems that collect and store any healthcare information generated in the practitioner-patient interaction. By now, the step of transforming such raw data into useful "knowledge" that eases and supports the final clinical decision-making process has not been applied yet. Nevertheless, such challenge of transforming raw data into knowledge is not trivial. It is estimated that in a regional hospital more than 3 million clinical documents can be generated per year, 80% of them contain unstructured or textual information. Up to now, textual clinical information has been practically ignored by most health institutions mainly due to the arduous process required to take advantage of the content of such vast amount of data. Thus, the main source of knowledge contained in the electronic medical records, which is in textual clinical narrative, is practically untapped. Additionally to the difficulty of the health organizations to obtain value from the text by using traditional tools, the peculiar characteristics of the clinical terminology is an added problem: high ambiguity and complexity of the vocabulary, free textual narrative, a poor terminological standardization and an overuse of acronyms and negations. In this complex framework and in view of the growing need to acquire knowledge to support the decision-making process, it is essential to use Intelligent Systems that help to extract the value from textual documents. Currently, there are very few real systems able to extract knowledge from clinical texts and to really ease the daily work of healthcare professionals in complex tasks such as risk factor detection or diagnostic prediction. In recent years, to face these problems up, there are a number of advanced techniques provided by the Text Mining (TM) discipline. TM might be defined as an area focused on the identification and extraction of new knowledge from textual information, and it is seen as a multidisciplinary field gathering techniques from other disciplines such as Natural Language Processing (NLP) and Machine Learning (ML). In this sense, this doctoral Thesis first provides an exhaustive and detailed analysis of the state-of-the-art on the TM discipline in Medicine. This analysis includes the most outstanding methods, techniques, tasks, resources and trends in the field. As a result, this review revealed that the existing systems to support the clinical decision-making process by applying a textual clinical information are scarce, and they generally perform a single task on a specific area of knowledge and for very specific domains hardly applied to problems on different environments. In this regard, this Thesis proposes the development of a new system, called MiNerDoc, to support clinical decision-making by applying a combination of techniques from the TM discipline, along with the terminological and semantic enrichment provided by the MetaMap tool and the UMLS metathesaurus. MiNerDoc allows, among other functionalities, the detection of risk factors or clinical events of interest and automatic inference of standardized diagnostic codes based on the textual information included in clinical reports. The proposed TM system has been evaluated based on an extensive experimental study and the results have demonstrated the effectiveness and viability of such system in two tasks, recognition of medical entities and multi-label diagnostic classification
    corecore