44 research outputs found

    Extreme multi-label deep neural classification of Spanish health records according to the International Classification of Diseases

    Get PDF
    111 p.Este trabajo trata sobre la minería de textos clínicos, un campo del Procesamiento del Lenguaje Natural aplicado al dominio biomédico. El objetivo es automatizar la tarea de codificación médica. Los registros electrónicos de salud (EHR) son documentos que contienen información clínica sobre la salud de unpaciente. Los diagnósticos y procedimientos médicos plasmados en la Historia Clínica Electrónica están codificados con respecto a la Clasificación Internacional de Enfermedades (CIE). De hecho, la CIE es la base para identificar estadísticas de salud internacionales y el estándar para informar enfermedades y condiciones de salud. Desde la perspectiva del aprendizaje automático, el objetivo es resolver un problema extremo de clasificación de texto de múltiples etiquetas, ya que a cada registro de salud se le asignan múltiples códigos ICD de un conjunto de más de 70 000 términos de diagnóstico. Una cantidad importante de recursos se dedican a la codificación médica, una laboriosa tarea que actualmente se realiza de forma manual. Los EHR son narraciones extensas, y los codificadores médicos revisan los registros escritos por los médicos y asignan los códigos ICD correspondientes. Los textos son técnicos ya que los médicos emplean una jerga médica especializada, aunque rica en abreviaturas, acrónimos y errores ortográficos, ya que los médicos documentan los registros mientras realizan la práctica clínica real. Paraabordar la clasificación automática de registros de salud, investigamos y desarrollamos un conjunto de técnicas de clasificación de texto de aprendizaje profundo

    A reproducible approach with R markdown to automatic classification of medical certificates in French

    Get PDF
    In this paper, we report the ongoing developments of our first participation to the Cross-Language Evaluation Forum (CLEF) eHealth Task 1: “Multilingual Information Extraction - ICD10 coding” (Névéol et al., 2017). The task consists in labelling death certificates, in French with international standard codes. In particular, we wanted to accomplish the goal of the ‘Replication track’ of this Task which promotes the sharing of tools and the dissemination of solid, reproducible results.In questo articolo presentiamo gli sviluppi del lavoro iniziato con la partecipazione al Laboratorio CrossLanguage Evaluation Forum (CLEF) eHealth denominato: “Multilingual Information Extraction - ICD10 coding” (Névéol et al., 2017) che ha come obiettivo quello di classificare certificati di morte in lingua francese con dei codici standard internazionali. In particolare, abbiamo come obiettivo quello proposto dalla ‘Replication track’ di questo Task, che promuove la condivisione di strumenti e la diffusione di risultati riproducibili

    Supporting the Billing Process in Outpatient Medical Care: Automated Medical Coding Through Machine Learning

    Get PDF
    Reimbursement in medical care implies significant administrative effort for medical staff. To bill the treatments or services provided, diagnosis and treatment codes must be assigned to patient records using standardized healthcare classification systems, which is a time-consuming and error-prone task. In contrast to ICD diagnosis codes used in most countries for inpatient care reimbursement, outpatient medical care often involves different reimbursement schemes. Following the Action Design Research methodology, we developed an NLP-based machine learning artifact in close collaboration with a general practitioner’s office in Germany, leveraging a dataset of over 5,600 patients with more than 63,000 billing codes. For the code prediction of most problematic treatments as well as a complete code prediction task, we achieved F1-scores of 93.60 % and 78.22 %, respectively. Throughout three iterations, we derived five meta requirements leading to three design principles for an automated coding system to support the reimbursement of outpatient medical care

    Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks

    Get PDF
    Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications

    Comparative study of healthcare messaging standards for interoperability in ehealth systems

    Get PDF
    Advances in the information and communication technology have created the field of "health informatics," which amalgamates healthcare, information technology and business. The use of information systems in healthcare organisations dates back to 1960s, however the use of technology for healthcare records, referred to as Electronic Medical Records (EMR), management has surged since 1990’s (Net-Health, 2017) due to advancements the internet and web technologies. Electronic Medical Records (EMR) and sometimes referred to as Personal Health Record (PHR) contains the patient’s medical history, allergy information, immunisation status, medication, radiology images and other medically related billing information that is relevant. There are a number of benefits for healthcare industry when sharing these data recorded in EMR and PHR systems between medical institutions (AbuKhousa et al., 2012). These benefits include convenience for patients and clinicians, cost-effective healthcare solutions, high quality of care, resolving the resource shortage and collecting a large volume of data for research and educational needs. My Health Record (MyHR) is a major project funded by the Australian government, which aims to have all data relating to health of the Australian population stored in digital format, allowing clinicians to have access to patient data at the point of care. Prior to 2015, MyHR was known as Personally Controlled Electronic Health Record (PCEHR). Though the Australian government took consistent initiatives there is a significant delay (Pearce and Haikerwal, 2010) in implementing eHealth projects and related services. While this delay is caused by many factors, interoperability is identified as the main problem (Benson and Grieve, 2016c) which is resisting this project delivery. To discover the current interoperability challenges in the Australian healthcare industry, this comparative study is conducted on Health Level 7 (HL7) messaging models such as HL7 V2, V3 and FHIR (Fast Healthcare Interoperability Resources). In this study, interoperability, security and privacy are main elements compared. In addition, a case study conducted in the NSW Hospitals to understand the popularity in usage of health messaging standards was utilised to understand the extent of use of messaging standards in healthcare sector. Predominantly, the project used the comparative study method on different HL7 (Health Level Seven) messages and derived the right messaging standard which is suitable to cover the interoperability, security and privacy requirements of electronic health record. The issues related to practical implementations, change over and training requirements for healthcare professionals are also discussed

    Robust input representations for low-resource information extraction

    Get PDF
    Recent advances in the field of natural language processing were achieved with deep learning models. This led to a wide range of new research questions concerning the stability of such large-scale systems and their applicability beyond well-studied tasks and datasets, such as information extraction in non-standard domains and languages, in particular, in low-resource environments. In this work, we address these challenges and make important contributions across fields such as representation learning and transfer learning by proposing novel model architectures and training strategies to overcome existing limitations, including a lack of training resources, domain mismatches and language barriers. In particular, we propose solutions to close the domain gap between representation models by, e.g., domain-adaptive pre-training or our novel meta-embedding architecture for creating a joint representations of multiple embedding methods. Our broad set of experiments demonstrates state-of-the-art performance of our methods for various sequence tagging and classification tasks and highlight their robustness in challenging low-resource settings across languages and domains.Die jüngsten Fortschritte auf dem Gebiet der Verarbeitung natürlicher Sprache wurden mit Deep-Learning-Modellen erzielt. Dies führte zu einer Vielzahl neuer Forschungsfragen bezüglich der Stabilität solcher großen Systeme und ihrer Anwendbarkeit über gut untersuchte Aufgaben und Datensätze hinaus, wie z. B. die Informationsextraktion für Nicht-Standardsprachen, aber auch Textdomänen und Aufgaben, für die selbst im Englischen nur wenige Trainingsdaten zur Verfügung stehen. In dieser Arbeit gehen wir auf diese Herausforderungen ein und leisten wichtige Beiträge in Bereichen wie Repräsentationslernen und Transferlernen, indem wir neuartige Modellarchitekturen und Trainingsstrategien vorschlagen, um bestehende Beschränkungen zu überwinden, darunter fehlende Trainingsressourcen, ungesehene Domänen und Sprachbarrieren. Insbesondere schlagen wir Lösungen vor, um die Domänenlücke zwischen Repräsentationsmodellen zu schließen, z.B. durch domänenadaptives Vortrainieren oder unsere neuartige Meta-Embedding-Architektur zur Erstellung einer gemeinsamen Repräsentation mehrerer Embeddingmethoden. Unsere umfassende Evaluierung demonstriert die Leistungsfähigkeit unserer Methoden für verschiedene Klassifizierungsaufgaben auf Word und Satzebene und unterstreicht ihre Robustheit in anspruchsvollen, ressourcenarmen Umgebungen in verschiedenen Sprachen und Domänen

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Preface

    Get PDF
    corecore