3,182 research outputs found

    Mining frequent biological sequences based on bitmap without candidate sequence generation

    Get PDF
    Biological sequences carry a lot of important genetic information of organisms. Furthermore, there is an inheritance law related to protein function and structure which is useful for applications such as disease prediction. Frequent sequence mining is a core technique for association rule discovery, but existing algorithms suffer from low efficiency or poor error rate because biological sequences differ from general sequences with more characteristics. In this paper, an algorithm for mining Frequent Biological Sequence based on Bitmap, FBSB, is proposed. FBSB uses bitmaps as the simple data structure and transforms each row into a quicksort list QS-list for sequence growth. For the continuity and accuracy requirement of biological sequence mining, tested sequences used during the mining process of FBSB are real ones instead of generated candidates, and all the frequent sequences can be mined without any errors. Comparing with other algorithms, the experimental results show that FBSB can achieve a better performance on both run time and scalability

    HYBRID: an efficient unifying process to mine frequent itemsets

    Get PDF
    Current advancement in technology inexorably leads to data flood. More data is generated from banking, telecom, scientific experiments, etc. Data mining is the process of extracting useful information from this flooded data, which helps in making profitable future decisions in these fields. Frequent itemset mining is one of the focus research areas and an important step to fin association rules. Time and space requirements for generating frequent itemsets are of utter importance. Algorithms to mine frequent itemsets effectively help in finding association rules and also help in many other data mining tasks. In this paper, an efficient hybrid algorithm was designed using a unifying process of the algorithms Improved Apriori and FP-Growth. Results indicate that the proposed hybrid algorithm, albeit more complex, consumes fewer memory resources and faster execution time

    Enhancing FP-Growth Performance Using Multi-threading based on Comparative Study

    Get PDF
    The time required for generating frequent patterns plays an important role in mining association rules, especially when there exist a large number of patterns and/or long patterns. Association rule mining has been focused as a major challenge within the field of data mining in research for over a decade. Although tremendous progress has been made, algorithms still need improvements since databases are growing larger and larger. In this research we present a performance comparison between two frequent pattern extraction algorithms implemented in Java, they are the Recursive Elimination (RElim) and FP-Growth, these algorithms are used in finding frequent itemsets in the transaction database. We found that FP-growth outperformed RElim in term of execution time. In this context, multithreading is used to enhance the time efficiency of FP-growth algorithm. The results showed that multithreaded FP-growth is more efficient compared to single threaded FP-growth

    Data Stream Mining: A Review on Windowing Approach

    Get PDF
    In the data stream model the data arrive at high speed so that the algorithms used for mining the data streams must process them in very strict constraints of space and time. This raises new issues that need to be considered when developing association rule mining algorithms for data streams. So it is important to study the existing stream mining algorithms to open up the challenges and the research scope for the new researchers. In this paper we are discussing different type windowing techniques and the important algorithms available in this mining process

    MBA: Market Basket Analysis Using Frequent Pattern Mining Techniques

    Get PDF
    This Market Basket Analysis (MBA) is a data mining technique that uses frequent pattern mining algorithms to discover patterns of co-occurrence among items that are frequently purchased together. It is commonly used in retail and e-commerce businesses to generate association rules that describe the relationships between different items, and to make recommendations to customers based on their previous purchases. MBA is a powerful tool for identifying patterns of co-occurrence and generating insights that can improve sales and marketing strategies. Although a numerous works has been carried out to handle the computational cost for discovering the frequent itemsets, but it still needs more exploration and developments. In this paper, we introduce an efficient Bitwise-Based data structure technique for mining frequent pattern in large-scale databases. The algorithm scans the original database once, using the Bitwise-Based data representations as well as vertical database layout, compared to the well-known Apriori and FP-Growth algorithm. Bitwise-Based technique enhance the problems of multiple passes over the original database, hence, minimizes the execution time. Extensive experiments have been carried out to validate our technique, which outperform Apriori, Éclat, FP-growth, and H-mine in terms of execution time for Market Basket Analysis
    corecore