66 research outputs found

    Ensembles of Randomized Time Series Shapelets Provide Improved Accuracy while Reducing Computational Costs

    Full text link
    Shapelets are discriminative time series subsequences that allow generation of interpretable classification models, which provide faster and generally better classification than the nearest neighbor approach. However, the shapelet discovery process requires the evaluation of all possible subsequences of all time series in the training set, making it extremely computation intensive. Consequently, shapelet discovery for large time series datasets quickly becomes intractable. A number of improvements have been proposed to reduce the training time. These techniques use approximation or discretization and often lead to reduced classification accuracy compared to the exact method. We are proposing the use of ensembles of shapelet-based classifiers obtained using random sampling of the shapelet candidates. Using random sampling reduces the number of evaluated candidates and consequently the required computational cost, while the classification accuracy of the resulting models is also not significantly different than that of the exact algorithm. The combination of randomized classifiers rectifies the inaccuracies of individual models because of the diversity of the solutions. Based on the experiments performed, it is shown that the proposed approach of using an ensemble of inexpensive classifiers provides better classification accuracy compared to the exact method at a significantly lesser computational cost

    Binary Shapelet Transform for Multiclass Time Series Classification

    Get PDF
    Shapelets have recently been proposed as a new primitive for time series classification. Shapelets are subseries of series that best split the data into its classes. In the original research, shapelets were found recursively within a decision tree through enumeration of the search space. Subsequent research indicated that using shapelets as the basis for transforming datasets leads to more accurate classifiers. Both these approaches evaluate how well a shapelet splits all the classes. However, often a shapelet is most useful in distinguishing between members of the class of the series it was drawn from against all others. To assess this conjecture, we evaluate a one vs all encoding scheme. This technique simplifies the quality assessment calculations, speeds up the execution through facilitating more frequent early abandon and increases accuracy for multi-class problems. We also propose an alternative shapelet evaluation scheme which we demonstrate significantly speeds up the full search

    New Approaches for Data-mining and Classification of Mental Disorder in Brain Imaging Data

    Get PDF
    Brain imaging data are incredibly complex and new information is being learned as approaches to mine these data are developed. In addition to studying the healthy brain, new approaches for using this information to provide information about complex mental illness such as schizophrenia are needed. Functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) are two well-known neuroimaging approaches that provide complementary information, both of which provide a huge amount of data that are not easily modelled. Currently, diagnosis of mental disorders is based on a patients self-reported experiences and observed behavior over the longitudinal course of the illness. There is great interest in identifying biologically based marker of illness, rather than relying on symptoms, which are a very indirect manifestation of the illness. The hope is that biological markers will lead to earlier diagnosis and improved treatment as well as reduced costs. Understanding mental disorders is a challenging task due to the complexity of brain structure and function, overlapping features between disorders, small numbers of data sets for training, heterogeneity within disorders, and a very large amount of high dimensional data. This doctoral work proposes machine learning and data mining based algorithms to detect abnormal functional network connectivity patterns of patients with schizophrenia and distinguish them from healthy controls using 1) independent components obtained from task related fMRI data, 2) functional network correlations based on resting-state and a hierarchy of tasks, and 3) functional network correlations in both fMRI and MEG data. The abnormal activation patterns of the functional network correlation of patients are characterized by using a statistical analysis and then used as an input to classification algorithms. The framework presented in this doctoral study is able to achieve good characterization of schizophrenia and provides an initial step towards designing an objective biological marker-based diagnostic test for schizophrenia. The methods we develop can also help us to more fully leverage available imaging technology in order to better understand the mystery of the human brain, the most complex organ in the human body

    A multi-granularity pattern-based sequence classification framework for educational data

    Get PDF
    In many application domains, such as education, sequences of events occurring over time need to be studied in order to understand the generative process behind these sequences, and hence classify new examples. In this paper, we propose a novel multi-granularity sequence lassification framework that generates features based on frequent patterns at multiple levels of time granularity. Feature selection techniques are applied to identify the most informative features that are then used to construct the classification model. We show the applicability and suitability of the proposed framework to the area of educational data mining by experimenting on an educational dataset collected from an asynchronous communication tool in which students interact to accomplish an underlying group project. The experimental results showed that our model can achieve competitive performance in detecting the students' roles in their corresponding projects, compared to a baseline similarity-based approach

    Mining time-series data using discriminative subsequences

    Get PDF
    Time-series data is abundant, and must be analysed to extract usable knowledge. Local-shape-based methods offer improved performance for many problems, and a comprehensible method of understanding both data and models. For time-series classification, we transform the data into a local-shape space using a shapelet transform. A shapelet is a time-series subsequence that is discriminative of the class of the original series. We use a heterogeneous ensemble classifier on the transformed data. The accuracy of our method is significantly better than the time-series classification benchmark (1-nearest-neighbour with dynamic time-warping distance), and significantly better than the previous best shapelet-based classifiers. We use two methods to increase interpretability: First, we cluster the shapelets using a novel, parameterless clustering method based on Minimum Description Length, reducing dimensionality and removing duplicate shapelets. Second, we transform the shapelet data into binary data reflecting the presence or absence of particular shapelets, a representation that is straightforward to interpret and understand. We supplement the ensemble classifier with partial classifocation. We generate rule sets on the binary-shapelet data, improving performance on certain classes, and revealing the relationship between the shapelets and the class label. To aid interpretability, we use a novel algorithm, BruteSuppression, that can substantially reduce the size of a rule set without negatively affecting performance, leading to a more compact, comprehensible model. Finally, we propose three novel algorithms for unsupervised mining of approximately repeated patterns in time-series data, testing their performance in terms of speed and accuracy on synthetic data, and on a real-world electricity-consumption device-disambiguation problem. We show that individual devices can be found automatically and in an unsupervised manner using a local-shape-based approach
    • …
    corecore