26,384 research outputs found

    Multimodal Machine Learning for Automated ICD Coding

    Full text link
    This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD-10 codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.Comment: Machine Learning for Healthcare 201

    MapReduce-iterative support vector machine classifier: novel fraud detection systems in healthcare insurance industry

    Get PDF
    Fraud in healthcare insurance claims is one of the significant research challenges that affect the growth of the healthcare services. The healthcare frauds are happening through subscribers, companies and the providers. The development of a decision support is to automate the claim data from service provider and to offset the patient’s challenges. In this paper, a novel hybridized big data and statistical machine learning technique, named MapReduce based iterative support vector machine (MR-ISVM) that provide a set of sophisticated steps for the automatic detection of fraudulent claims in the health insurance databases. The experimental results have proven that the MR-ISVM classifier outperforms better in classification and detection than other support vector machine (SVM) kernel classifiers. From the results, a positive impact seen in declining the computational time on processing the healthcare insurance claims without compromising the classification accuracy is achieved. The proposed MR-ISVM classifier achieves 87.73% accuracy than the linear (75.3%) and radial basis function (79.98%)

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
    corecore