4 research outputs found

    A framework for trend mining with application to medical data

    Get PDF
    This thesis presents research work conducted in the field of knowledge discovery. It presents an integrated trend-mining framework and SOMA, which is the application of the trend-mining framework in diabetic retinopathy data. Trend mining is the process of identifying and analysing trends in the context of the variation of support of the association/classification rules that have been extracted from longitudinal datasets. The integrated framework concerns all major processes from data preparation to the extraction of knowledge. At the pre-process stage, data are cleaned, transformed if necessary, and sorted into time-stamped datasets using logic rules. At the next stage, time-stamp datasets are passed through the main processing, in which the ARM technique of matrix algorithm is applied to identify frequent rules with acceptable confidence. Mathematical conditions are applied to classify the sequences of support values into trends. Afterwards, interestingness criteria are applied to obtain interesting knowledge, and a visualization technique is proposed that maps how objects are moving from the previous to the next time stamp. A validation and verification (external and internal validation) framework is described that aims to ensure that the results at the intermediate stages of the framework are correct and that the framework as a whole can yield results that demonstrate causality. To evaluate the thesis, SOMA was developed. The dataset is, in itself, also of interest, as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The Royal Liverpool University Hospital has been a major centre for retinopathy research since 1991. Retinopathy is a generic term used to describe damage to the retina of the eye, which can, in the long term, lead to visual loss. Diabetic retinopathy is used to evaluate the framework, to determine whether SOMA can extract knowledge that is already known to the medics. The results show that those datasets can be used to extract knowledge that can show causality between patients’ characteristics such as the age of patient at diagnosis, type of diabetes, duration of diabetes, and diabetic retinopathy

    Event and state detection in time series by genetic programming

    Get PDF
    Event and state detection in time series has significant value in scientific areas and real-world applications. The aim of detecting time series event and state patterns is to identify particular variations of user-interest in one or more channels of time series streams. For example, dangerous driving behaviours such as sudden braking and harsh acceleration can be detected from continuous recordings from inertial sensors. However, the existing methods are highly dependent on domain knowledge such as the size of the time series pattern and a set of effective features. Furthermore, they are not directly suitable for multi-channel time series data. In this study, we establish a genetic programming based method which can perform classification on multi-channel time series data. It does not need the domain knowledge required by the existing methods. The investigation consists of four parts: the methodology, an evaluation on event detection tasks, an evaluation on state detection tasks and an analysis on the suitability for real-world applications. In the methodology, a GP based method is proposed for processing and analysing multi-channel time series streams. The function set includes basic mathematical operations. In addition, specific functions and terminals are introduced to reserve historical information, capture temporal dependency across time points and handle dependency between channels. These functions and terminals help the GP based method to automatically find the pattern size and extract features. This study also investigates two different fitness functions - accuracy and area under the curve. The proposed method is investigated on a range of event detection tasks. The investigation starts from synthetic tasks such as detecting complete sine waves. The performance of the GP based method is compared to traditional classification methods. On the raw data the GP based method achieves 100 percent accuracy, which outperforms all the non-GP methods.The performance of the non-GP methods is comparable to the GP based method only with suitable features. In addition, the GP based method is investigated on two complex real-world event detection tasks - dangerous driving behaviour detection and video shot detection. In the task of detecting three dangerous driving behaviours from 21-channel time series data, the GP based method performs consistently better than the non-GP classifiers even when features are provided. In the video shot detection task, the GP based method achieves comparable performance on 11200-channel time series to the non-GP classifiers on 28 features. The GP based method is more accurate than a commercial product. The GP based method has also been investigated on state detection tasks. This involves synthetic tasks such as detecting concurrent high values in four of five channels and a real-world activity recognition problem. The results also show that the GP based method consistently outperforms the non-GP methods even with the presence of manually constructed features. As part of the investigation, a mobile phone based activity recognition data set was collected as there was no existing publicly available data set. The suitability of the GP based method for solving real-world problems is further analysed. Our analysis shows that the GP based method can be successfully extended for multi-class classification. The analysis of the evolved programs demonstrates that they do capture time series patterns. On synthetic data sets, the injected regularities are revealed in understandable individuals. The best programs for three real-world problems are more difficult to explain but still provide some insight. The selection of relevant channels and data points by the programs are consistent with domain knowledge. In addition, the analysis shows that the proposed method still performs well for time series pattern of different sizes. The effective window sizes of the evolved GP programs are close to the pattern size. Finally, our study on execution performance of the evolved programs shows that these programs are fast in execution and are suitable for real-time applications. In summary, the GP based method is suitable for the kinds of real-world applications studied in this thesis. This thesis concludes that, with a suitable representation, genetic programming can be an effective method for event and state detection in multi-channel time series for a range of synthetic and real-world tasks. This method does not require much domain knowledge such as the pattern size and suitable features. It offers an effective classification method in similar tasks that are studied in this thesis
    corecore