502 research outputs found

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta

    Doctor of Philosophy

    Get PDF
    dissertationDomain adaptation of natural language processing systems is challenging because it requires human expertise. While manual e ort is e ective in creating a high quality knowledge base, it is expensive and time consuming. Clinical text adds another layer of complexity to the task due to privacy and con dentiality restrictions that hinder the ability to share training corpora among di erent research groups. Semantic ambiguity is a major barrier for e ective and accurate concept recognition by natural language processing systems. In my research I propose an automated domain adaptation method that utilizes sublanguage semantic schema for all-word word sense disambiguation of clinical narrative. According to the sublanguage theory developed by Zellig Harris, domain-speci c language is characterized by a relatively small set of semantic classes that combine into a small number of sentence types. Previous research relied on manual analysis to create language models that could be used for more e ective natural language processing. Building on previous semantic type disambiguation research, I propose a method of resolving semantic ambiguity utilizing automatically acquired semantic type disambiguation rules applied on clinical text ambiguously mapped to a standard set of concepts. This research aims to provide an automatic method to acquire Sublanguage Semantic Schema (S3) and apply this model to disambiguate terms that map to more than one concept with di erent semantic types. The research is conducted using unmodi ed MetaMap version 2009, a concept recognition system provided by the National Library of Medicine, applied on a large set of clinical text. The project includes creating and comparing models, which are based on unambiguous concept mappings found in seventeen clinical note types. The e ectiveness of the nal application was validated through a manual review of a subset of processed clinical notes using recall, precision and F-score metrics

    Cognitive Computing supported Medical Decision Support System for Patient’s Driving Assessment

    Get PDF
    To smartly utilize a huge and constantly growing volume of data, improve productivity and increase competitiveness in various fields of life; human requires decision making support systems that efficiently process and analyze the data, and, as a result, significantly speed up the process. Similarly to all other areas of human life, healthcare domain also is lacking Artificial Intelligence (AI) based solution. A number of supervised and unsupervised Machine Learning and Data Mining techniques exist to help us to deal with structured data. However, in a real life, we pretty much deal with unstructured data that hides useful knowledge and valuable information inside human-readable plain texts, images, audio and video. Therefore, such IT giants as IBM, Google, Microsoft, Intel, Facebook, etc., as well as variety of SMEs are actively elaborating different Cognitive Computing services and tools to get a value from unstructured data. Thus, the paper presents feasibility study of IBM Watson cognitive computing services and tools to address the issue of automated health records processing to support doctor’s decision for patient’s driving assessment

    Mapping of electronic health records in Spanish to the unified medical language system metathesaurus

    Get PDF
    [EN] This work presents a preliminary approach to annotate Spanish electronic health records with concepts of the Unified Medical Language System Metathesaurus. The prototype uses Apache Lucene R to index the Metathesaurus and generate mapping candidates from input text. In addition, it relies on UKB to resolve ambiguities. The tool has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora, one consisting of titles and abstracts of papers in the clinical domain, and the other of real electronic health record excerpts.[EU] Lan honetan, espainieraz idatzitako mediku-txosten elektronikoak Unified Medical Languge System Metathesaurus deituriko terminologia biomedikoarekin etiketatzeko lehen urratsak eman dira. Prototipoak Apache Lucene R erabiltzen du Metathesaurus-a indexatu eta mapatze hautagaiak sortzeko. Horrez gain, anbiguotasunak UKB bidez ebazten ditu. Ebaluazioari dagokionez, prototipoaren eta MetaMap-en arteko adostasuna neurtu da bi ingelera-gaztelania corpus paralelotan. Corpusetako bat artikulu zientifikoetako izenburu eta laburpenez osatutako dago. Beste corpusa mediku-txosten pasarte batzuez dago osatuta

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Extracting diagnostic knowledge from MedLine Plus: a comparison between MetaMap and cTAKES Approaches

    Get PDF
    The development of diagnostic decision support systems (DDSS) requires having a reliable and consistent knowledge base about diseases and their symptoms, signs and diagnostic tests. Physicians are typically the source of this knowledge, but it is not always possible to obtain all the desired information from them. Other valuable sources are medical books and articles describing the diagnosis of diseases, but again, extracting this information is a hard and time-consuming task. In this paper we present the results of our research, in which we have used Web scraping, natural language processing techniques, a variety of publicly available sources of diagnostic knowledge and two widely known medical concept identifiers, MetaMap and cTAKES, to extract diagnostic criteria for infectious diseases from MedLine Plus articles. A performance comparison of MetaMap and cTAKES is also presented

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Word Sense Disambiguation for clinical abbreviations

    Get PDF
    Abbreviations are extensively used in electronic health records (EHR) of patients as well as medical documentation, reaching 30-50% of the words in clinical narrative. There are more than 197,000 unique medical abbreviations found in the clinical text and their meanings vary depending on the context in which they are used. Since data in electronic health records could be shareable across health information systems (hospitals, primary care centers, etc.) as well as others such as insurance companies information systems, it is essential determining the correct meaning of the abbreviations to avoid misunderstandings. Clinical abbreviations have specific characteristic that do not follow any standard rules for creating them. This makes it complicated to find said abbreviations and corresponding meanings. Furthermore, there is an added difficulty to working with clinical data due to privacy reasons, since it is essential to have them in order to develop and test algorithms. Word sense disambiguation (WSD) is an essential task in natural language processing (NLP) applications such as information extraction, chatbots and summarization systems among others. WSD aims to identify the correct meaning of the ambiguous word which has more than one meaning. Disambiguating clinical abbreviations is a type of lexical sample WSD task. Previous research works adopted supervised, unsupervised and Knowledge-based (KB) approaches to disambiguate clinical abbreviations. This thesis aims to propose a classification model that apart from disambiguating well known abbreviations also disambiguates rare and unseen abbreviations using the most recent deep neural network architectures for language modeling. In clinical abbreviation disambiguation several resources and disambiguation models were encountered. Different classification approaches used to disambiguate the clinical abbreviations were investigated in this thesis. Considering that computers do not directly understand texts, different data representations were implemented to capture the meaning of the words. Since it is also necessary to measure the performance of algorithms, the evaluation measurements used are discussed. As the different solutions proposed to clinical WSD we have explored static word embeddings data representation on 13 English clinical abbreviations of the UMN data set (from University of Minnesota) by testing traditional supervised machine learning algorithms separately for each abbreviation. Moreover, we have utilized a transformer-base pretrained model that was fine-tuned as a multi-classification classifier for the whole data set (75 abbreviations of the UMN data set). The aim of implementing just one multi-class classifier is to predict rare and unseen abbreviations that are most common in clinical narrative. Additionally, other experiments were conducted for a different type of abbreviations (scientific abbreviations and acronyms) by defining a hybrid approach composed of supervised and knowledge-based approaches. Most previous works tend to build a separated classifier for each clinical abbreviation, tending to leverage different data resources to overcome the data acquisition bottleneck. However, those models were restricted to disambiguate terms that have been seen in trained data. Meanwhile, based on our results, transfer learning by fine-tuning a transformer-based model could predict rare and unseen abbreviations. A remaining challenge for future work is to improve the model to automate the disambiguation of clinical abbreviations on run-time systems by implementing self-supervised learning models.Las abreviaturas se utilizan ampliamente en las historias clínicas electrónicas de los pacientes y en mucha documentación médica, llegando a ser un 30-50% de las palabras empleadas en narrativa clínica. Existen más de 197.000 abreviaturas únicas usadas en textos clínicos siendo términos altamente ambiguos El significado de las abreviaturas varía en función del contexto en el que se utilicen. Dado que los datos de las historias clínicas electrónicas pueden compartirse entre servicios, hospitales, centros de atención primaria así como otras organizaciones como por ejemplo, las compañías de seguros es fundamental determinar el significado correcto de las abreviaturas para evitar además eventos adversos relacionados con la seguridad del paciente. Nuevas abreviaturas clínicas aparecen constantemente y tienen la característica específica de que no siguen ningún estándar para su creación. Esto hace que sea muy difícil disponer de un recurso con todas las abreviaturas y todos sus significados. A todo esto hay que añadir la dificultad para trabajar con datos clínicos por cuestiones de privacidad cuando es esencial disponer de ellos para poder desarrollar algoritmos para su tratamiento. La desambiguación del sentido de las palabras (WSD, en inglés) es una tarea esencial en tareas de procesamiento del lenguaje natural (PLN) como extracción de información, chatbots o generadores de resúmenes, entre otros. WSD tiene como objetivo identificar el significado correcto de una palabra ambigua (que tiene más de un significado). Esta tarea se ha abordado previamente utilizando tanto enfoques supervisados, no supervisados así como basados en conocimiento. Esta tesis tiene como objetivo definir un modelo de clasificación que además de desambiguar abreviaturas conocidas desambigüe también abreviaturas menos frecuentes que no han aparecido previamente en los conjuntos de entrenaminto utilizando las arquitecturas de redes neuronales profundas más recientes relacionadas ocn los modelos del lenguaje. En la desambiguación de abreviaturas clínicas se emplean diversos recursos y modelos de desambiguación. Se han investigado los diferentes enfoques de clasificación utilizados para desambiguar las abreviaturas clínicas. Dado que un ordenador no comprende directamente los textos, se han implementado diferentes representaciones de textos para capturar el significado de las palabras. Puesto que también es necesario medir el desempeño de cualquier algoritmo, se describen también las medidas de evaluación utilizadas. La mayoría de los trabajos previos se han basado en la construcción de un clasificador separado para cada abreviatura clínica. De este modo, tienden a aprovechar diferentes recursos de datos para superar el cuello de botella de la adquisición de datos. Sin embargo, estos modelos se limitaban a desambiguar con los datos para los que el sistema había sido entrenado. Se han explorado además representaciones basadas vectores de palabras (word embeddings) estáticos para 13 abreviaturas clínicas en el corpus UMN en inglés (de la University of Minnesota) utilizando algoritmos de clasificación tradicionales de aprendizaje automático supervisados (un clasificador por cada abreviatura). Se ha llevado a cabo un segundo experimento utilizando un modelo multi-clasificador sobre todo el conjunto de las 75 abreviaturas del corpus UMN basado en un modelo Transformer pre-entrenado. El objetivo ha sido implementar un clasificador multiclase para predecir también abreviaturas raras y no vistas. Se realizó un experimento adicional para siglas científicas en documentos de dominio abierto mediante la aplicación de un enfoque híbrido compuesto por enfoques supervisados y basados en el conocimiento. Así, basándonos en los resultados de esta tesis, el aprendizaje por transferencia (transfer learning) mediante el ajuste (fine-tuning) de un modelo de lenguaje preentrenado podría predecir abreviaturas raras y no vistas sin necesidad de entrenarlas previamente. Un reto pendiente para el trabajo futuro es mejorar el modelo para automatizar la desambiguación de las abreviaturas clínicas en tiempo de ejecución mediante la implementación de modelos de aprendizaje autosupervisados.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Israel González Carrasco.- Secretario: Leonardo Campillos Llanos.- Vocal: Ana María García Serran

    A Case-Based Reasoning Model Powered by Deep Learning for Radiology Report Recommendation

    Get PDF
    Case-Based Reasoning models are one of the most used reasoning paradigms in expert-knowledge-driven areas. One of the most prominent fields of use of these systems is the medical sector, where explainable models are required. However, these models are considerably reliant on user input and the introduction of relevant curated data. Deep learning approaches offer an analogous solution, where user input is not required. This paper proposes a hybrid Case-Based Reasoning, Deep Learning framework for medical-related applications, focusing on the generation of medical reports. The proposal combines the explainability and user-focused approach of case-based reasoning models with the deep learning techniques performance. Moreover, the framework is fully modular to fit a wide variety of tasks and data, such as real-time sensor captured data, images, or text, to name a few. An implementation of the proposed framework focusing on radiology report generation assistance is provided. This implementation is used to evaluate the proposal, showing that it can provide meaningful and accurate corrections, even when the amount of information available is minimal. Additional tests on the optimization degree of the case base are also performed, evidencing how the proposed framework can optimize this base to achieve optimal performance
    • …
    corecore