28 research outputs found

    SemClinBr -- a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks

    Full text link
    The high volume of research focusing on extracting patient's information from electronic health records (EHR) has led to an increase in the demand for annotated corpora, which are a very valuable resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multi-purpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field. In this study, we developed a semantically annotated corpus using clinical texts from multiple medical specialties, document types, and institutions. We present the following: (1) a survey listing common aspects and lessons learned from previous research, (2) a fine-grained annotation schema which could be replicated and guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations. The result of this work is the SemClinBr, a corpus that has 1,000 clinical notes, labeled with 65,117 entities and 11,263 relations, and can support a variety of clinical NLP tasks and boost the EHR's secondary use for the Portuguese language

    GERNERMED++: Transfer Learning in German Medical NLP

    Full text link
    We present a statistical model for German medical natural language processing trained for named entity recognition (NER) as an open, publicly available model. The work serves as a refined successor to our first GERNERMED model which is substantially outperformed by our work. We demonstrate the effectiveness of combining multiple techniques in order to achieve strong results in entity recognition performance by the means of transfer-learning on pretrained deep language models (LM), word-alignment and neural machine translation. Due to the sparse situation on open, public medical entity recognition models for German texts, this work offers benefits to the German research community on medical NLP as a baseline model. Since our model is based on public English data, its weights are provided without legal restrictions on usage and distribution. The sample code and the statistical model is available at: https://github.com/frankkramer-lab/GERNERMED-p

    GERNERMED++: semantic annotation in German medical NLP through transfer-learning, translation and word alignment

    Get PDF
    We present a statistical model, GERNERMED++, for German medical natural language processing trained for named entity recognition (NER) as an open, publicly available model. We demonstrate the effectiveness of combining multiple techniques in order to achieve strong results in entity recognition performance by the means of transfer-learning on pre-trained deep language models (LM), word-alignment and neural machine translation, outperforming a pre-existing baseline model on several datasets. Due to the sparse situation of open, public medical entity recognition models for German texts, this work offers benefits to the German research community on medical NLP as a baseline model. The work serves as a refined successor to our first GERNERMED model. Similar to our previous work, our trained model is publicly available to other researchers. The sample code and the statistical model is available at: https://github.com/frankkramer-lab/GERNERMED-p

    Does Enrichment of Clinical Texts by Ontology Concepts Increases Classification Accuracy?

    Get PDF
    In the medical domain, multiple ontologies and terminology systems are available. However, existing classification and prediction algorithms in the clinical domain often ignore or insufficiently utilize semantic information as it is provided in those ontologies. To address this issue, we introduce a concept for augmenting embeddings, the input to deep neural networks, with semantic information retrieved from ontologies. To do this, words and phrases of sentences are mapped to concepts of a medical ontology aggregating synonyms in the same concept. A semantically enriched vector is generated and used for sentence classification. We study our approach on a sentence classification task using a real world dataset which comprises 640 sentences belonging to 22 categories. A deep neural network model is defined with an embedding layer followed by two LSTM layers and two dense layers. Our experiments show, classification accuracy without content enriched embeddings is for some categories higher than without enrichment. We conclude that semantic information from ontologies has potential to provide a useful enrichment of text. Future research will assess to what extent semantic relationships from the ontology can be used for enrichment

    Information extraction from Spanish radiology reports

    Get PDF
    En los últimos a˜nos, la cantidad de información clínica disponible en formato digital ha crecido constantemente debido a la adopción del uso de sistemas de informática médica. En la mayoría de los casos, dicha información se encuentra representada en forma textual. La extracción de información contenida en dichos textos puede utilizarse para colaborar en tareas relacionadas con la clínica médica y para la toma de decisiones, y resulta esencial para la mejora de la atención médica. El dominio biomédico tiene vocabulario altamente especializado, local a distintos países, regiones e instituciones. Se utilizan abreviaturas ambiguas y no estándares. Por otro lado, algunos tipos de informes médicos suelen presentar faltas ortográficas y errores gramaticales. Además, la cantidad de datos anotados disponibles es escasa, debido a la dificultad de obtenerlos y a temas relacionados con la confidencialidad de la información. Esta situación dificulta el avance en el área de extracción de información. Pese a ser el segundo idioma con mayor cantidad de hablantes nativos en el mundo, poco trabajo se ha realizado hasta ahora en extracción de información de informes médicos escritos en espa˜nol. A los desafíos anteriormente descriptos se agregan la ausencia de terminologías específicas para ciertos dominios médicos y la menor disponibilidad de recursos linguísticos que los existentes para otros idiomas. En este trabajo contribuimos al dominio de la biomedicina en espa˜nol, proveyendo métodos con resultados competitivos para el desarrollo de componentes fundamentales de un proceso de extracción de información médico, específicamente para informes radiológicos. Con este fin, creamos un corpus anotado de informes radiológicos en espa˜nol para el reconocimiento de entidades, negación y especulación y extracción de relaciones. Publicamos el proceso seguido para la anotación y el esquema desarrollado. Implementamos dos algoritmos de detección de entidades nombradas con el fin de encontrar entidades anatómicas y hallazgos clínicos. El primero está basado en un diccionario especializado del dominio no disponible en espa˜nol y en el uso de reglas basadas en conocimiento morfosintáctico y está pensado para trabajar con lenguajes sin muchos recursos linguísticos. El segundo está basado en campos aleatorios condicionales y arroja mejores resultados. Adicionalmente, estudiamos e implementamos distintas soluciones para la detección de hallazgos clínicos negados. Para esto, adaptamos al espa˜nol un conocido algoritmo de detección de negaciones en textos médicos escritos en inglés y desarrollamos un método basado en reglas creadas a partir de patrones inferidos del análisis de caminos en árboles de dependencias. También adaptamos el primer método, que arrojó los mejores resultados, para la detección de negación y especulación en resúmenes de alta hospitalaria y notas de evolución clínica escritos en alemán. Consideramos que los resultados obtenidos y la publicación de criterios de anotación y evaluación contribuirán a seguir avanzando en la extracción de información de informes clínicos escritos en espa˜nol.In the last years, the number of digitized clinical data has been growing steadily, due to the adoption of clinical information systems. A great amount of this data is in textual format. The extraction of information contained in texts can be used to support clinical tasks and decisions and is essential for improving health care. The biomedical domain uses a highly specialized and local vocabulary, with abundance of non-standard and ambiguous abbreviations. Moreover, some type of medical reports present ill-formed sentences and lack of diacritics. Publicly accessible annotated data is scarce, due to two main reasons: the difficulty of creating it and the confidential nature of the data, that demands de-identification. This situation hinders the advance of information extraction in the biomedical domain area. Although Spanish is the second language in terms of numbers of native speakers in the world, not much work has been done in information extraction from Spanish medical reports. Challenges include the absence of specific terminologies for certain medical domains in Spanish and the availability of linguistic resources, that are less developed than those of high resources languages, such as English. In this thesis, we contribute to the BioNLP domain by providing methods with competitive results to apply a fragment of a medical information extraction pipeline to Spanish radiology reports. Therefore, an annotated dataset for entity recognition, negation and speculation detection, and relation extraction was created. The annotation process followed and the annotation schema developed were shared with the community. Two named entity recognition algorithms were implemented for the detection of anatomical entities and clinical findings. The first algorithm developed is based on a specialized dictionary of the radiology domain not available in Spanish and in the use of rules based on morphosyntactic knowledge and is designed for named entity recognition in medium or low resource languages. The second one, based on conditional random fields, was implemented when we were able to obtain a larger set of annotated data and achieves better results. We also studied and implemented different solutions for negation detection of clinical findings: an adaptation to Spanish of a popular negation detection algorithm for English medical reports and a rule-based method that detects negations based on patterns inferred from the analysis of paths of dependency parse trees. The first method obtained the best results and was also adapted for negation and speculation detection in German clinical notes and discharge summaries. We consider that the results obtained, and the annotation guidelines provided will bring new benefits to further advance in the field of information extraction from Spanish medical reports.Fil: Cotik, Viviana Erica. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina

    Datenintegration, Wissensrepräsentation und Datenanalyse – Werkzeuge zur systematischen Untersuchung von Einflussfaktoren auf das Langzeit-Outcome nephrologischer Patienten

    Get PDF
    Das Gesundheitssystem wird sich durch die Digitalisierung in der Zukunft radikal ändern. Besserer Informationsaustausch durch vernetzte Fallakten, neue Versorgungsformen wie z.B. Telemedizinanwendungen können zukünftig das Problem des Fachärztemangels in strukturschwachen Gegenden vermindern. Neue Mobile Health Anwendungen werden die Patienten aktiver in Therapiemöglichkeiten einbinden und das Patient Empowerment verbessern. Zusätzlich werden durch die Digitalisierung immer mehr Daten produziert, die einen Betrag zur medizinischen Forschung und Verbesserung von Therapien leisten können. Neben den Herausforderungen zum Datenschutz und zur Datensicherheit, müssen auch Fragen zur Interoperabilität, Nutzen und Transparenz geklärt werden. Diese Arbeit untersucht exemplarisch an drei konkreten Beispielen (zur Datenintegration, Wissensrepräsentation und Datenanalyse), welche Herausforderungen und Lösungen möglich sind, um medizinische Daten effektiv zu nutzen und die Forschung und Routineversorgung zu verbessern. In der Studie zur Datenintegration wurde untersucht, inwieweit sich eine auf einem relationalen Datenbankschema basierende medizinische Routinedatenbank mit Langzeitdaten von transplantierten Patienten, in eine Ontologie-basierte Forschungsdatenbank wie i2b2, ohne Informationsverlust überführen lässt. Des Weiteren wurde in der Studie zur Wissensrepräsentation untersucht, wie sich mit Hilfe von Open Source Entwicklungswerkzeugen eine Applikation zur Visualisierung von Informationen aus strukturierten und unstrukturierten medizinischen Daten implementieren lässt. Mit der entwickelten Applikation kann das medizinische Personal ohne Programmierkenntnisse Informationen aus dem medizinischen Datenpool extrahieren und systematisch analysieren. Das Thema Datenanalyse wurde durch die Studie zum akuten Nierenversagen näher beleuchtet. In dieser Studie wurde ein Algorithmus implementiert, der in einer großen Kohorte aus stationären Patientendaten, das Ereignis akutes Nierenversagen (ANV) detektieren kann. Nach der statistischen Auswertung der Ergebnisse dieses Algorithmus, konnte die Kohorte im Hinblick auf das Auftreten von akuten Nierenversagen und den damit verbundenen Krankheitscharakteristika und Risikoassoziationen umfassend beschrieben werden.The digitalization will radically transform the healthcare system in the future. New forms of health care e.g. telemedicine or interconnected health records have the capability to reduce the problem of the shortage of medical experts in rural areas. New mobile health applications will involve patients more actively in their treatment options and will improve patient empowerment. Furthermore, the digitalization is producing more and more data, which should foster medical research and further improve of therapies. In addition to the challenges of data protection and data security, questions about interoperability, medical value and transparency must also be addressed. This thesis is based on three concrete examples (for data integration, knowledge representation and data analysis) and investigates which challenges and solutions are possible to use medical data effectively and to improve research and routine medical care. The study on data integration examined the extent to which a relational database for routine medical care with long-term data from transplanted patients can be transferred to an ontology-based research database such as i2b2 without loss of information. The study on the representation of knowledge examined the implementation of an application for the visualization of information from structured and unstructured medical data by using open source development tools. With the fully developed application, medical personnel can now extract information from the medical data base and easily analyse data without programming knowledge. The study on acute kidney failure examined the topic of data analysis in more detail. In this study, an algorithm was implemented that can detect the event of acute kidney failure in a large cohort of inpatient hospital data. After the statistical analysis of the results of this algorithm, the cohort could be comprehensively described with regard to the occurrence of acute kidney failure and the associated disease characteristics and risk associations

    Negation and Speculation in NLP: A Survey, Corpora, Methods, and Applications

    Get PDF
    Negation and speculation are universal linguistic phenomena that affect the performance of Natural Language Processing (NLP) applications, such as those for opinion mining and information retrieval, especially in biomedical data. In this article, we review the corpora annotated with negation and speculation in various natural languages and domains. Furthermore, we discuss the ongoing research into recent rule-based, supervised, and transfer learning techniques for the detection of negating and speculative content. Many English corpora for various domains are now annotated with negation and speculation; moreover, the availability of annotated corpora in other languages has started to increase. However, this growth is insufficient to address these important phenomena in languages with limited resources. The use of cross-lingual models and translation of the well-known languages are acceptable alternatives. We also highlight the lack of consistent annotation guidelines and the shortcomings of the existing techniques, and suggest alternatives that may speed up progress in this research direction. Adding more syntactic features may alleviate the limitations of the existing techniques, such as cue ambiguity and detecting the discontinuous scopes. In some NLP applications, inclusion of a system that is negation- and speculation-aware improves performance, yet this aspect is still not addressed or considered an essential step

    Front-Line Physicians' Satisfaction with Information Systems in Hospitals

    Get PDF
    Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe
    corecore