1 research outputs found

    Computer-based identification of relationships between medical concepts and cluster analysis in clinical notes

    Get PDF
    Clinical notes contain information about medical concepts or entities (such as diseases, treatments and drugs) that provide a comprehensive and overall impression of the patient’s health. The automatic extraction of these entities is relevant for health experts and researchers as they identify associations between the latter. However, automatically extracting information from clinical notes is challenging, due to their narrative format. This research describes a process to automatically extract and aggregate medical entities from clinical notes, as well as the process to identify clusters of patients and disease-treatment relationships. The i2b2 2008 Obesity dataset was used, and consists of 1237 discharge summaries of overweight and diabetic patients. Therefore, this thesis is focused on obesity diseases. For the automatic extraction of medical entities, MetaMap and cTAKES were used, and the automatic extraction capacity of both tools compared. Also, UMLS enabled the aggregation of the extracted entities. Two approaches were applied for cluster analysis. Firstly, a sparse K-means algorithm was used over a patient-disease matrix with 14 comorbidities related to obesity. Secondly, to visualize and analyze other diseases present in the clinical notes, 86 diseases were used to identify clusters of patients with a network-based approach. Furthermore, bipartite graphs were used to explore disease-treatment relationships among some of the clusters obtained. The result of the experiments we conducted show cTAKES slightly outperforming MetaMap, but this situation can change, considering other configuration options in the respective tools, including an abbreviation list. Moreover, concept aggregation (with similar and different semantic types) was shown to be a good strategy for improving medical entity extraction. The sparse K-means enabled identification of three types of clusters (high, medium and low), based on the number of comorbidities and the percentage of patients suffering from them. These results show that diabetes, hypercholesterolemia, atherosclerotic cardiovascular diseases, congestive heart failure, obstructive sleep apnea, and depression were the most prevalent diseases. With the network approach, it was possible to visualize and analyze patient information. In it, three sub-graphs or clusters were identified: obese patients with metabolic problems, obese patients with infection problems, and obese patients with a mechanical problem. Bipartite graphs for a disease-treatment relationship showed treatments for different types of diseases, which means that obese patients are suffering from multiple diseases. This work shows that clinical notes are a rich source of information, and they can be used to explore, visualize, and analyze patient’s information by applying different approaches. More work is needed to explore the relationship between the different medical entities from clinical notes and from different disease datasets. Also, considering that some medical documents express events in time, this characteristic should be considered in future works to form a personalized portrait of clusters, diseases and patients
    corecore