12 research outputs found
FIBS: A Generic Framework for Classifying Interval-based Temporal Sequences
We study the problem of classifying interval-based temporal sequences
(IBTSs). Since common classification algorithms cannot be directly applied to
IBTSs, the main challenge is to define a set of features that effectively
represents the data such that classifiers can be applied. Most prior work
utilizes frequent pattern mining to define a feature set based on discovered
patterns. However, frequent pattern mining is computationally expensive and
often discovers many irrelevant patterns. To address this shortcoming, we
propose the FIBS framework for classifying IBTSs. FIBS extracts features
relevant to classification from IBTSs based on relative frequency and temporal
relations. To avoid selecting irrelevant features, a filter-based selection
strategy is incorporated into FIBS. Our empirical evaluation on eight
real-world datasets demonstrates the effectiveness of our methods in practice.
The results provide evidence that FIBS effectively represents IBTSs for
classification algorithms, which contributes to similar or significantly better
accuracy compared to state-of-the-art competitors. It also suggests that the
feature selection strategy is beneficial to FIBS's performance.Comment: In: Big Data Analytics and Knowledge Discovery. DaWaK 2020. Springer,
Cha
Analysis and Visualization Methods for Data-Driven Longitudinal Patient Summary
Digitization of health records has opened avenues for intensive research in the fields of health informatics. Power of machine learning, statistical analysis and visual analytics could be utilized to make optimal use of this information. The proposed project is to develop an interactive visualization tool that summarizes a patient's medical history, highlighting all his/her important events based on the knowledge of similar patients. Given a set of patients with common conditions, statistical analysis can be used to develop models that prioritize features based on associations between features and condition-specific outcome measures.
This manuscript in particular describes the model developed to prioritize a patient's events from his medical history. The model is trained with the population of patients and their events. Their correlations with the outcome variable are calculated to identify the important events in a specific cohort. This correlation score can be used to prioritize the events associated with an individual patient. This model is one of the models that will be used to summarize an individual patient's medical data via interactive visualization methods.Master of Science in Information Scienc
Recommended from our members
Data Abstraction for Visualizing Large Time Series
Numeric time series is a class of data consisting of chronologically ordered observations represented by numeric values. Much of the data in various domains, such as financial, medical and scientific, are represented in the form of time series. To cope with the increasing sizes of datasets, numerous approaches for abstracting large temporal data are developed in the area of data mining. Many of them proved to be useful for time series visualization. However, despite the existence of numerous surveys on time series mining and visualization, there is no comprehensive classification of the existing methods based on the needs of visualization designers. We propose a classification framework that defines essential criteria for selecting an abstraction method with an eye to subsequent visualization and support of users' analysis tasks. We show that approaches developed in the data mining field are capable of creating representations that are useful for visualizing time series data. We evaluate these methods in terms of the defined criteria and provide a summary table that can be easily used for selecting suitable abstraction methods depending on data properties, desirable form of representation, behaviour features to be studied, required accuracy and level of detail, and the necessity of efficient search and querying. We also indicate directions for possible extension of the proposed classification framework
Recommended from our members
Opening the black box: Personalizing type 2 diabetes patients based on their latent phenotype and temporal associated complication rules
Ā© 2020 The Authors. It is widely considered that approximately 10% of the population suffers from type 2 diabetes. Unfortunately, the impact of this disease is underestimated. Patient's mortality often occurs due to complications caused by the disease and not the disease itself. Many techniques utilized in modeling diseases are often in the form of a āblack boxā where the internal workings and complexities are extremely difficult to understand, both from practitioners' and patients' perspective. In this work, we address this issue and present an informative model/pattern, known as a ālatent phenotype,ā with an aim to capture the complexities of the associated complications' over time. We further extend this idea by using a combination of temporal association rule mining and unsupervised learning in order to find explainable subgroups of patients with more personalized prediction. Our extensive findings show how uncovering the latent phenotype aids in distinguishing the disparities among subgroups of patients based on their complications patterns. We gain insight into how best to enhance the prediction performance and reduce bias in the models applied using uncertainty in the patients' data
Mining Predictive Patterns and Extension to Multivariate Temporal Data
An important goal of knowledge discovery is the search for patterns in the data that can help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explaining a specific outcome variable. An example is the task of identifying groups of patients that respond better to a certain treatment than the rest of the patients.
We propose and present efficient methods for mining predictive patterns for both atemporal and temporal (time series) data. Our first method relies on frequent pattern mining to explore the search space. It applies a novel evaluation technique for extracting a small set of frequent patterns that are highly predictive and have low redundancy. We show the benefits of this method on several synthetic and public datasets.
Our temporal pattern mining method works on complex multivariate temporal data, such as electronic health records, for the event detection task. It first converts time series into time-interval sequences of temporal abstractions and then mines temporal patterns backwards in time, starting from patterns related to the most recent observations. We show the benefits of our temporal pattern mining method on two real-world clinical tasks