34,314 research outputs found

    Mining Uncertain Sequential Patterns in Iterative MapReduce

    Get PDF
    This paper proposes a sequential pattern mining (SPM) algorithm in large scale uncertain databases. Uncertain sequence databases are widely used to model inaccurate or imprecise timestamped data in many real applications, where traditional SPM algorithms are inapplicable because of data uncertainty and scalability. In this paper, we develop an efficient approach to manage data uncertainty in SPM and design an iterative MapReduce framework to execute the uncertain SPM algorithm in parallel. We conduct extensive experiments in both synthetic and real uncertain datasets. And the experimental results prove that our algorithm is efficient and scalable

    FP-Growth Tree Based Algorithms Analysis: CP-Tree and K Map

    Get PDF
    We propose a novel frequent-pattern tree (FP-tree) structure; our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods. FP-tree method is efficient algorithm in association mining to mine frequent patterns in data mining, in spite of long or short frequent data patterns. By using compact best tree structure and partitioning-based and divide-and-conquer data mining searching method, it can be reduces the costs searchsubstantially .it just as the analysis multi-CPU or reduce computer memory to solve problem. But this approach can be apparently decrease the costs for exchanging and combining control information and the algorithm complexity is also greatly decreased, solve this problem efficiently. Even if main adopting multi-CPU technique, raising the requirement is basically hardware, best performanceimprovement is still to be limited. Is there any other way that most one may it can reduce these costs in FP-tree construction, performance best improvement is still limited

    Item-centric mining of frequent patterns from big uncertain data

    Get PDF
    Item-centric mining of frequent patterns from big uncertain dat

    Mining High Utility Sequential Patterns from Uncertain Web Access Sequences using the PL-WAP

    Get PDF
    In general, the web access patterns are retrieved from the web access sequence databases using various sequential pattern algorithms such as GSP, WAP, and PLWAP tree. However, these algorithms do not consider sequential data with quantity (internal utility) (e.g., the amount of the time spent by the user on a web page) and quality (external utility) (e.g., the rating of a web page in a website) information. These algorithms also do not work on uncertain sequential items (e.g., purchased products) having probability (0, 1). Factoring in the utility and uncertainty of each sequence item provides more product information that can be beneficial in mining profitable patterns from company’s websites. For example, a customer can purchase a bottle of ink more frequently than a printer but the purchase of a single printer can yield more profit to the business owner than the purchase of multiple bottles of ink. Most existing traditional uncertain sequential pattern algorithms such as U-Apriori, UF-Growth, and U-PLWAP do not include the utility measures. In U-PLWAP, the web sequences are derived from web log data without including the time spent by the user and the web pages are not associated with any rating. By considering these two utilities, sometimes the items with lower existential probability can be more profitable to the website owner. In utility based traditional algorithms, the only algorithm related to both uncertain and high utility is the PHUI-UP algorithm which considers the probability and utility as different entities and the retrieved patterns are not dependent with both due to two different thresholds, and it does not mine uncertain web access database sequences. This thesis proposes the algorithm HUU-PLWAP miner for mining uncertain sequential patterns with internal and external utility information using PLWAP tree approach that cut down on several database scans of level-wise approaches. HUU-PLWAP uses uncertain internal utility values (derived from sequence uncertainty model) and the constant external utility values (predefined) to retrieve the high utility sequential patterns from uncertain web access sequence databases with the help of U-PLWAP methodology. Experiments show that HUU-PLWAP is at least 95% faster than U-PLWAP, and 75% faster than the PHUI-UP algorithm
    corecore