5 research outputs found

    AliClu - Temporal sequence alignment for clustering longitudinal clinical data

    Get PDF
    The authors acknowledge funding the Portuguese Foundation for Science and Technology (Fundação para a Ciência e a Tecnologia - FCT) under contracts INESC-ID (UID/CEC/50021/2019) and IT (UID/EEA/50008/2019), projects PREDICT (PTDC/CCI-CIF/29877/2017), PERSEIDS (PTDC/EMS-SIS/0642/2014) and NEUROCLINOMICS2 (PTDC/EEI-SII/1937/2014). The funders had no role in the design of the study, collection, analysis and interpretation of data, or writing the manuscript.BACKGROUND: Patient stratification is a critical task in clinical decision making since it can allow physicians to choose treatments in a personalized way. Given the increasing availability of electronic medical records (EMRs) with longitudinal data, one crucial problem is how to efficiently cluster the patients based on the temporal information from medical appointments. In this work, we propose applying the Temporal Needleman-Wunsch (TNW) algorithm to align discrete sequences with the transition time information between symbols. These symbols may correspond to a patient's current therapy, their overall health status, or any other discrete state. The transition time information represents the duration of each of those states. The obtained TNW pairwise scores are then used to perform hierarchical clustering. To find the best number of clusters and assess their stability, a resampling technique is applied. RESULTS: We propose the AliClu, a novel tool for clustering temporal clinical data based on the TNW algorithm coupled with clustering validity assessments through bootstrapping. The AliClu was applied for the analysis of the rheumatoid arthritis EMRs obtained from the Portuguese database of rheumatologic patient visits (Reuma.pt). In particular, the AliClu was used for the analysis of therapy switches, which were coded as letters corresponding to biologic drugs and included their durations before each change occurred. The obtained optimized clusters allow one to stratify the patients based on their temporal therapy profiles and to support the identification of common features for those groups. CONCLUSIONS: The AliClu is a promising computational strategy to analyse longitudinal patient data by providing validated clusters and by unravelling the patterns that exist in clinical outcomes. Patient stratification is performed in an automatic or semi-automatic way, allowing one to tune the alignment, clustering, and validation parameters. The AliClu is freely available at https://github.com/sysbiomed/AliClu.publishersversionpublishe

    Prediction Sequence Patterns of Tourist from the Tourism Website by Hybrid Deep Learning Techniques

    Get PDF
    Tourism is an important industry that generates incomes and jobs in the country where this industry contributes considerably to GDP. Before traveling, tourists usually need to plan an itinerary listing a sequence of where to visit and what to do. To help plan, tourists usually gather information by reading blogs and boards where visitors who have previously traveled posted about traveling places and activities. Text from traveling posts can infer travel itinerary and sequences of places to visit and activities to experience. This research aims to analyze text postings using 21 deep learning techniques to learn sequential patterns of places and activities. The three main techniques are Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU) and a combination of these techniques including their adaptation with batch normalization. The output is sequential patterns for predicting places or activities that tourists are likely to go and plan to do. The results are evaluated using mean absolute error (MAE) and mean squared error (MSE) loss metrics. Moreover, the predicted sequences of places and activities are further assessed using a sequence alignment method called the Needleman–Wunsch algorithm (NW), which is a popular method to estimate sequence matching between two sequences

    Unsupervised learning methods for identifying and evaluating disease clusters in electronic health records

    Get PDF
    Introduction Clustering algorithms are a class of algorithms that can discover groups of observations in complex data and are often used to identify subtypes of heterogeneous diseases in electronic health records (EHR). Evaluating clustering experiments for biological and clinical significance is a vital but challenging task due to the lack of consensus on best practices. As a result, the translation of findings from clustering experiments to clinical practice is limited. Aim The aim of this thesis was to investigate and evaluate approaches that enable the evaluation of clustering experiments using EHR. Methods We conducted a scoping review of clustering studies in EHR to identify common evaluation approaches. We systematically investigated the performance of the identified approaches using a cohort of Alzheimer's Disease (AD) patients as an exemplar comparing four different clustering methods (K-means, Kernel K-means, Affinity Propagation and Latent Class Analysis.). Using the same population, we developed and evaluated a method (MCHAMMER) that tested whether clusterable structures exist in EHR. To develop this method we tested several cluster validation indexes and methods of generating null data to see which are the best at discovering clusters. In order to enable the robust benchmarking of evaluation approaches, we created a tool that generated synthetic EHR data that contain known cluster labels across a range of clustering scenarios. Results Across 67 EHR clustering studies, the most popular internal evaluation metric was comparing cluster results across multiple algorithms (30% of studies). We examined this approach conducting a clustering experiment on AD patients using a population of 10,065 AD patients and 21 demographic, symptom and comorbidity features. K-means found 5 clusters, Kernel K means found 2 clusters, Affinity propagation found 5 and latent class analysis found 6. K-means 4 was found to have the best clustering solution with the highest silhouette score (0.19) and was more predictive of outcomes. The five clusters found were: typical AD (n=2026), non-typical AD (n=1640), cardiovascular disease cluster (n=686), a cancer cluster (n=1710) and a cluster of mental health issues, smoking and early disease onset (n=1528), which has been found in previous research as well as in the results of other clustering methods. We created a synthetic data generation tool which allows for the generation of realistic EHR clusters that can vary in separation and number of noise variables to alter the difficulty of the clustering problem. We found that decreasing cluster separation did increase cluster difficulty significantly whereas noise variables increased cluster difficulty but not significantly. To develop the tool to assess clusters existence we tested different methods of null dataset generation and cluster validation indices, the best performing null dataset method was the min max method and the best performing indices we Calinksi Harabasz index which had an accuracy of 94%, Davies Bouldin index (97%) silhouette score ( 93%) and BWC index (90%). We further found that when clusters were identified using the Calinski Harabasz index they were more likely to have significantly different outcomes between clusters. Lastly we repeated the initial clustering experiment, comparing 10 different pre-processing methods. The three best performing methods were RBF kernel (2 clusters), MCA (4 clusters) and MCA and PCA (6 clusters). The MCA approach gave the best results highest silhouette score (0.23) and meaningful clusters, producing 4 clusters; heart and circulatory( n=1379), early onset mental health (n=1761), male cluster with memory loss (n = 1823), female with more problem (n=2244). Conclusion We have developed and tested a series of methods and tools to enable the evaluation of EHR clustering experiments. We developed and proposed a novel cluster evaluation metric and provided a tool for benchmarking evaluation approaches in synthetic but realistic EHR

    Analysing the Impact of Changes in User Interface of e-Health Record Systems on Clinical Pathways using Process Mining

    Get PDF
    The provision of care in a hospital includes a series of activities that are often recorded in the electronic health record (EHR) systems. Analysing the data in these EHRs has the potential to support the understanding of care processes and exploring the opportunities for process improvement. One of the emerging data analytics approaches for such analyses is process mining, and one critical challenge in working with EHR data is that processes might change over time. This thesis uses a process mining approach to detect process change over time and analyse the impact of those changes on the EHR data. The overall aim is to summarise the attributable change in the data due to the process so that clinicians can better analyse the data. Three datasets were used in this study to understand the variability of the EHR systems. The first dataset is a publicly available EHR data that was used for developing the methods and supporting the reproducibility of the research. The second dataset is a de-identified subset of the database of cancer patients from the Leeds Cancer Centre. The second dataset was used in the experiments to improve on the results of a previous study using the same dataset. The third dataset was the full Leeds Cancer Centre EHR database after more comprehensive ethics was approved. In the third dataset, experiments were done to analyse the impact of a known system change on clinical pathways and to explore process change over time without a known system change. All three datasets were analysed using process mining. Process mining was shown to be useful for analysing clinical pathways and exploring process changes over time. It can be used to visualise the process before and after a known change. When the system change is unknown, process mining can be used to explore the process execution over time and identify the potential period where the system was changed. This thesis explores some aspects of the complex interrelatedness of process and user interface (UI) of the EHR system
    corecore