591 research outputs found

    Text Classification Using Association Rules, Dependency Pruning and Hyperonymization

    Full text link
    We present new methods for pruning and enhancing item- sets for text classification via association rule mining. Pruning methods are based on dependency syntax and enhancing methods are based on replacing words by their hyperonyms of various orders. We discuss the impact of these methods, compared to pruning based on tfidf rank of words.Comment: 16 pages, 2 figures, presented at DMNLP 201

    A Fast Minimal Infrequent Itemset Mining Algorithm

    Get PDF
    A novel fast algorithm for finding quasi identifiers in large datasets is presented. Performance measurements on a broad range of datasets demonstrate substantial reductions in run-time relative to the state of the art and the scalability of the algorithm to realistically-sized datasets up to several million records

    A log mining approach for process monitoring in SCADA

    Get PDF
    SCADA (Supervisory Control and Data Acquisition) systems are used for controlling and monitoring industrial processes. We propose a methodology to systematically identify potential process-related threats in SCADA. Process-related threats take place when an attacker gains user access rights and performs actions, which look legitimate, but which are intended to disrupt the SCADA process. To detect such threats, we propose a semi-automated approach of log processing. We conduct experiments on a real-life water treatment facility. A preliminary case study suggests that our approach is effective in detecting anomalous events that might alter the regular process workflow

    Discovering itemset interactions

    Get PDF
    Itemsets, which are treated as intermediate results in association mining, have attracted significant research due to the inherent complexity of their generation. However, there is currently little literature focusing upon the interactions between itemsets, the nature of which may potentially contain valuable information. This paper presents a novel tree-based approach to discovering item-set interactions, a task which cannot be undertaken by current association mining techniques

    Multi-threaded Implementation of Association Rule Mining with Visualization of the Pattern Tree

    Get PDF
    Motor Vehicle fatalities per 100,000 population in the United States has been reported to be 10.69% in the year 2012 as per NHTSA (National Highway Traffic Safety Administration). The fatality rate has increased by 0.27% in 2012 compared to the rate in the year 2011. As per the reports, there are many factors involved in increasing the fatality rate drastically such as driving under influence, testing while driving, and various other weather phenomena. Decision makers need to analyze the factors attributing to the increase in an accident rate to take implied measures. Current methods used to perform the data analysis process has to be reformed and optimized to make policies for controlling the high traffic accident rates. This research work is an extension to the data-mining algorithm implementation Most Associated Sequential Pattern (MASP). MASP uses association rule mining approach to mine interesting traffic accident data using a modified version of FP-growth algorithm. Owing to the huge amounts of available traffic accident data, MASP algorithm needs to be further modified to make it more efficient with respect to both space and time. Therefore, we present a parallel implementation to the MASP algorithm. In addition to this, pattern tree and apriori-tid algorithm implementation has been done. The application is designed in C# using .NET Framework and C# Task Parallel Library

    Analysis of frequent itemset generation based on trie data structure in Apriori algorithm

    Get PDF
    Apriori is one technique of data mining association rules that aims to extract correlations between sets of items in the transaction database. The main problem with the Apriori algorithm is the process of scanning databases repeatedly to generate itemset candidates. This research examines the combination of pruning by using the trieapproach and multi-thread implementation in three algorithms to obtain frequent itemset. Trie is a data structure in the form of an ordered tree to store a set of strings where every node in the tree contains the same prefix. The use of a full combination trie (different from frequent pattern (FP) tree using links) allows the implementation of arrays and the hash calculation to achieve the addressing of itemset combination. In this research, the measure to get the address is called Hash-node calculation used to update support value. For these three alternatives, run time processing is analyzed based on the number of itemset combinations and transaction data at a certain minimum support value. The experimental results show that an algorithm thatexploits resource capabilities by applying multi-threadperforms almost seven times betterthanan algorithm implemented in single-thread in calculating hash-node. The fastest run time of the multi-thread approach is 43 minutes with 150-itemset combinations on 100,000 transaction data

    Discovering High Utility Itemsets using Hybrid Approach

    Get PDF
    Mining of high utility itemsets especially from the big transactional databases is time consuming task. For mining the high utility itemsets from large transactional datasets multiple methods are available and have some consequential limitations. In case of performance these methods need to be scrutinized under low memory based systems for mining high utility itemsets from transactional datasets as well as to address further measures. The proposed algorithm combines the High Utility Pattern Mining and Incremental Frequent Pattern Mining. Two algorithms used are Apriori and existing Parallel UP Growth for mining high utility itemsets using transactional databases. The information about high utility itemsets is maintained in a data structure called UP tree. These algorithms are not only used to scans the incremental database but also collects newly generated frequent itemsets support count. It provides fast execution because it includes new itemsets in tree and removes rare itemset from a utility pattern tree structure that reduces cost and time. From various Experimental analysis and results, this hybrid approach with existing Apriori and UP-Growth is proposed with aim of improving the performance
    • 

    corecore