3,602 research outputs found

    A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data

    Get PDF
    Traffic safety is an important part of the roadway in sustainable development. Freeway traffic crashes typically cause serious casualties and property losses, being a serious threat to public safety. Figuring out the potential correlation between various risk factors and revealing their coupling mechanisms are of effective ways to explore and identity freeway crash causes. However, the existing association rule mining algorithms still have some limitations in both efficiency and accuracy. Based on this consideration, using the freeway traffic crash data obtained from WDOT (Washington Department of Transportation), this research constructed a multi-dimensional multilevel system for traffic crash analysis. Considering the load balancing, the FP-Growth (Frequent Pattern- Growth) algorithm was optimized parallelly based on Hadoop platform, to achieve an efficient and accurate association rule mining calculation for massive amounts of traffic crash data; then, according to the results of the coupling mechanism among the crash precursors, the causes of freeway traffic crashes were identified and revealed. The results show that the parallel FPgrowth algorithm with load balancing constraints has a better operating speed than both the conventional FP-growth algorithm and parallel FP-growth algorithm towards processing big data. This improved algorithm makes full use of Hadoop cluster resources and is more suitable for large traffic crash data sets mining while retaining the original advantages of conventional association rule mining algorithm. In addition, the mining association rules model with the improvement of multi-dimensional interaction proposed in this research can catch the occurrence mechanism of freeway traffic crash with serious consequences (lower support degree probably) accurately and efficiently

    Survey On Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

    Get PDF
    In data mining and knowledge discovery technique domain, frequent pattern mining plays an important role but it does not consider different weight value of the items. Association Rule Mining is to find the correlation between data. The frequent itemsets are patterns or items like itemsets, substructures, or subsequences that come out in a data set frequently or continuously. In this paper we are presenting survey of various frequent pattern mining and weighted itemset mining. Different articles related to frequent and weighted infrequent itemset mining were proposed. This paper focus on survey of various Existing Algorithms related to frequent and infrequent itemset mining which creates a path for future researches in the field of Association Rule Mining

    Analytics of IoT Streaming Data using Modified New Pattern Mining Algorithm

    Get PDF
    In the era of information technology, everything we are using in the everyday life is represented in form of information. Transportation, parking, traffic, pollution are some examples of hundreds of infrastructure systems with which we act every day. By using information technologies combined with communication, it becomes very easy to represent all details even the tiniest parts of these fields in forms of data. Furthermore, the Internet of things (IoT) plays a very important role in connecting physical objects with electronics, software, and sensors. Based on that, smart cites have been modeled and implemented in thousands place over all the world; In these cities, all smart systems in different fields like transportation networks, pollution,traffic,airlines, etc. are showed in form of numbers and strings of characters.This paper represents the problems occur in this type of methods with little bit solution of them by new modified algorithm

    New Approaches to Frequent and Incremental Frequent Pattern Mining

    Full text link
    Data Mining (DM) is a process for extracting interesting patterns from large volumes of data. It is one of the crucial steps in Knowledge Discovery in Databases (KDD). It involves various data mining methods that mainly fall into predictive and descriptive models. Descriptive models look for patterns, rules, relationships and associations within data. One of the descriptive methods is association rule analysis, which represents co-occurrence of items or events. Association rules are commonly used in market basket analysis. An association rule is in the form of X ā†’ Y and it shows that X and Y co-occur with a given level of support and conļ¬dence. Association rule mining is a common technique used in discovering interesting frequent patterns in large datasets acquired in various application domains. Having petabytes of data ļ¬nding its way into data storages in perhaps every day, made many researchers look for eļ¬ƒcient methods for analyzing these large datasets. Many algorithms have been proposed for searching for frequent patterns. The search space combinatorically explodes as the size of the source data increases. Simply using more powerful computers, or even super-computers to handle ever-increasing size of large data sets is not suļ¬ƒcient. Hence, incremental algorithms have been developed and used to improve the eļ¬ƒciency of frequent pattern mining. One of the challenges of frequent itemset mining is long running times of the algorithms. Two major costs of long running times of frequent itemset mining are due to the number of database scans and the number of candidates generated (the latter one requires memory, and the more the number of candidates there are the more memory space is needed. When the candidates do not ļ¬t in memory then page swapping will occur which will increase the running time of the algorithms). In this dissertation we propose a new implementation of Apriori algorithm, NCLAT (Near Candidate-less Apriori with Tidlists), which scans the database only once and creates candidates only for level one (1-itemsets) which is equivalent to the total number of unique items in the database. In addition, we also show the results of choice of data structures used whether they are probabilistic or not, whether the datasets are horizontal or vertical, how counting is done, whether the algorithms are computed single or parallel way. We implement, explore and devise incremental algorithm UWEP with single as well as parallel computation. We have also cleaned a minor bug in UWEP and created a more eļ¬ƒcient version UWEP2, which reduces the number of candidates created and the number of database scans. We have run all of our tests against three datasets with diļ¬€erent features for diļ¬€erent minimum support levels. We show both frequent and incremental frequent itemset mining implementation test results and comparison to each other. While there has been a lot of work done on frequent itemset mining on structured data, very little work has been done on the unstructured data. So, we have created a new hybrid pattern search algorithm, Double-Hash, which performed better for all of our test scenarios than the known pattern search algorithms. Double-Hash can potentially be used in frequent itemset mining on unstructured data in the future. We will be presenting our work and test results on this as well
    • ā€¦
    corecore