3 research outputs found

    MD-Manifold: A Medical Distance Based Manifold Learning Approach for Heart Failure Readmission Prediction

    Get PDF
    Dimension reduction is considered as a necessary technique in Electronic Healthcare Records (EHR) data processing. However, no existing work addresses both of the two points: 1) generating low-dimensional representations for each patient visit; and 2) taking advantage of the well-organized medical concept structure as the domain knowledge. Hence, we propose a new framework to generate low-dimensional representations for medical data records by combining the concept-structure based distance with manifold learning. To demonstrate the efficacy, we generated low-dimensional representations for hospital visits of heart failure patients, which was further used for a 30-day readmission prediction. The experiments showed a great potential of the proposed representations (AUC = 60.7%) that has comparative predictive power of the state-of-the-art methods, including one hot encoding representations (AUC = 60.1%) and PCA representations (AUC = 58.3%), with much less training time (improved by 99%). The proposed framework can also be generalized to various healthcare-related prediction tasks, such as mortality prediction

    Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity

    No full text
    Abstract Background Many clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. However, the effects of diverse set sizes of taxonomic clinical concepts contributing to similarity at the patient level have not been well studied. Methods In this paper the most widely used taxonomic clinical concepts system, ICD-10, was studied as a representative taxonomy. The distance between ICD-10-coded diagnosis sets is an integrated estimation of the information content of each concept, the similarity between each pairwise concepts and the similarity between the sets of concepts. We proposed a novel method at the set-level similarity to calculate the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity. A real-world clinical dataset with ICD-10 coded diagnoses and hospital length of stay (HLOS) information was used to evaluate the performance of various algorithms and their combinations in predicting whether a patient need long-term hospitalization or not. Four subpopulation prototypes that were defined based on age and HLOS with different diagnoses set sizes were used as the target for similarity analysis. The F-score was used to evaluate the performance of different algorithms by controlling other factors. We also evaluated the effect of prototype set size on prediction precision. Results The results identified the strengths and weaknesses of different algorithms to compute information content, code-level similarity and set-level similarity under different contexts, such as set size and concept set background. The minimum weighted bipartite matching approach, which has not been fully recognized previously showed unique advantages in measuring the concepts-based patient similarity. Conclusions This study provides a systematic benchmark evaluation of previous algorithms and novel algorithms used in taxonomic concepts-based patient similarity, and it provides the basis for selecting appropriate methods under different clinical scenarios

    COHORT IDENTIFICATION FROM FREE-TEXT CLINICAL NOTES USING SNOMED CT’S SEMANTIC RELATIONS

    Get PDF
    In this paper, a new cohort identification framework that exploits the semantic hierarchy of SNOMED CT is proposed to overcome the limitations of supervised machine learning-based approaches. Eligibility criteria descriptions and free-text clinical notes from the 2018 National NLP Clinical Challenge (n2c2) were processed to map to relevant SNOMED CT concepts and to measure semantic similarity between the eligibility criteria and patients. The eligibility of a patient was determined if the patient had a similarity score higher than a threshold cut-off value, which was established where the best F1 score could be achieved. The performance of the proposed system was evaluated for three eligibility criteria. The current framework’s macro-average F1 score across three eligibility criteria was higher than the previously reported results of the 2018 n2c2 (0.933 vs. 0.889). This study demonstrated that SNOMED CT alone can be leveraged for cohort identification tasks without referring to external textual sources for training.Doctor of Philosoph
    corecore