3 research outputs found

    A modified multi-class association rule for text mining

    Get PDF
    Classification and association rule mining are significant tasks in data mining. Integrating association rule discovery and classification in data mining brings us an approach known as the associative classification. One common shortcoming of existing Association Classifiers is the huge number of rules produced in order to obtain high classification accuracy. This study proposes s a Modified Multi-class Association Rule Mining (mMCAR) that consists of three procedures; rule discovery, rule pruning and group-based class assignment. The rule discovery and rule pruning procedures are designed to reduce the number of classification rules. On the other hand, the group-based class assignment procedure contributes in improving the classification accuracy. Experiments on the structured and unstructured text datasets obtained from the UCI and Reuters repositories are performed in order to evaluate the proposed Association Classifier. The proposed mMCAR classifier is benchmarked against the traditional classifiers and existing Association Classifiers. Experimental results indicate that the proposed Association Classifier, mMCAR, produced high accuracy with a smaller number of classification rules. For the structured dataset, the mMCAR produces an average of 84.24% accuracy as compared to MCAR that obtains 84.23%. Even though the classification accuracy difference is small, the proposed mMCAR uses only 50 rules for the classification while its benchmark method involves 60 rules. On the other hand, mMCAR is at par with MCAR when unstructured dataset is utilized. Both classifiers produce 89% accuracy but mMCAR uses less number of rules for the classification. This study contributes to the text mining domain as automatic classification of huge and widely distributed textual data could facilitate the text representation and retrieval processes

    Mining Predictive Patterns and Extension to Multivariate Temporal Data

    Get PDF
    An important goal of knowledge discovery is the search for patterns in the data that can help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explaining a specific outcome variable. An example is the task of identifying groups of patients that respond better to a certain treatment than the rest of the patients. We propose and present efficient methods for mining predictive patterns for both atemporal and temporal (time series) data. Our first method relies on frequent pattern mining to explore the search space. It applies a novel evaluation technique for extracting a small set of frequent patterns that are highly predictive and have low redundancy. We show the benefits of this method on several synthetic and public datasets. Our temporal pattern mining method works on complex multivariate temporal data, such as electronic health records, for the event detection task. It first converts time series into time-interval sequences of temporal abstractions and then mines temporal patterns backwards in time, starting from patterns related to the most recent observations. We show the benefits of our temporal pattern mining method on two real-world clinical tasks

    Mining Emerging Patterns by Streaming Feature Selection

    No full text
    Building an accurate emerging pattern classifier with a highdimensional dataset is a challenging issue. The problem becomes even more difficult if the whole feature space is unavailable before learning starts. This paper presents a new technique on mining emerging patterns using streaming feature selection. We model high feature dimensions with streaming features, that is, features arrive and are processed one at a time. As features flow in one by one, we online evaluate each coming feature to determine whether it is useful for mining predictive emerging patterns (EPs) by exploiting the relationship between feature relevance and EP discriminability (the predictive ability of an EP). We employ this relationship to guide an online EP mining process. This new approach can mine EPs from a high-dimensional dataset, even when its entire feature set is unavailable before learning. The experiments on a broad range of datasets validate the effectiveness of the proposed approach against other wellestablished methods, in terms of predictive accuracy, pattern numbers and running time
    corecore