    Review Paper - High Utility Item sets Mining on Incremental Transactions using UP-Growth and UP-Growth+ Algorithm

    One of the important research area in data mining is high utility pattern mining. Discovering itemsets with high utility like profit from database is known as high utility itemset mining. There are number of existing algorithms have been work on this issue. Some of them incurs problem of generating large number of candidate itemsets. This leads to degrade the performance of mining in case of execution time and space. In this paper we have focus on UP-Growth and UP-Growth+ algorithm which overcomes this limitation. This technique uses tree based data structure, UP-Tree for generating candidate itemsets with two scan of database. In this paper we extend the functionality of these algorithms on incremental database.

    MBiS: an efficient method for mining frequent weighted utility itemsets from quantitative databases

    In recent years, methods for mining quantitative databases have been proposed. However, the processing time is fairly much, which affects the productivity of intelligent systems in the use of quantitative databases. This study proposes the multi-bit segment (MBiS) structure to store and process tidsets to increase the effeciency of mining frequent weighted utility itemsets (FWUIs) from quantitative databases. With this structure, the calculation of the intersection of tidsets between two itemsets becomes more convenient. Based on this structure, the authors define the MBiS-Tree structure and propose an algorithm for mining FWUIs from quantitative databases. Experimental results for a number of databases show that the proposed method outperforms existing methods


    In business, most of companies focus on growing their profits. Besides considering profit from each product, they also focus on the relationship among products in order to support effective decision making, gain more profits and attract their customers, e.g. shelf arrangement, product displays, or product marketing, etc. Some high utility association rules have been proposed, however, they consume much memory and require long time processing. This paper proposes LHAR (Lattice-based for mining High utility Association Rules) algorithm to mine high utility association rules based on a lattice of high utility itemsets. The LHAR algorithm aims to generates high utility association rules during the process of building lattice of high utility itemsets, and thus it needs less memory and runtim

    A Novel Approach to Extract High Utility Itemsets from Distributed Databases

    Traditional approaches in data mining focus on support and confidence measures which are just statistics based. Support and confidence measures which are based on the frequency count of the items enable us to derive the frequent itemsets. The frequency of the items as a single factor does not represent the interestingness of the items. To enhance the process of data mining tasks based on the value of the product, several researches were conducted. It resulted in utility mining which is an emerging field of research in data mining. In the recent years various data mining approaches have been implemented in order to find the high utility itemsets. The main objective of utility mining is to identify the itemsets with highest utilities, by considering the subjectively defined utility values, as set by the user. Existing methods based on utility mining concept focus on centralized systems where the data and associated processing is pertained to a particular location. As a further step ahead we try to implement the utility mining concept in a distributed environment. In this approach we use a sophisticated way of mining high utility itemsets using a Fast Utility Mining (FUM) algorithm


    High Utility Itemset (HUI) mining is an important problem in the data mining literature that considers the utilities for businesses of items (such as profits and margins) that are discovered from transactional databases. There are many algorithms for mining high utility itemsets (HUIs) by pruning candidates based on estimated and transaction-weighted utilization values. These algorithms aim to reduce the search space. In this paper, we propose a method for mining HUIs with negative unit profits from vertically distributed databases. This method does not integrate databases from the relevant local databases to form a centralized database. Experiments show that the run-time of this method is more efficient than that of the centralized database.Tập lợi ích cao (TLIC) là một vấn đề quan trọng trong khai phá dữ liệu, xem xét các lợi ích của các mục (chẳng hạn như lợi nhuận và lãi suất) được khám phá từ cơ sở dữ liệu (CSDL) giao dịch hỗ trợ cho việc kinh doanh của các đơn vị. Bài báo trình bày một phương pháp khai thác tập lợi ích cao có lợi nhuận âm trên CSDL phân tán dọc. Việc khai thác tập lợi ích cao đã được nghiên cứu và công bố rộng rãi trong những năm gần đây. Có nhiều thuật toán khai thác các tập lợi ích cao (TLIC) bằng cách cắt tỉa các ứng cử viên dựa trên các giá trị lợi ích và dựa trên các giá trị sử dụng có trọng số giao dịch. Các thuật toán này đều hướng tới mục đích làm giảm không gian tìm kiếm. Trong bài báo này, chúng tôi đề xuất một phương pháp khai thác tập lợi ích cao có lợi nhuận âm (TLIC-TSA) từ CSDL phân tán dọc. Phương pháp này không tích hợp CSDL từ CSDL cục bộ của các bên tham gia để hình thành CSDL tập trung và chỉ thực hiện việc quét các CSDL mỗi bên tham gia một lần. Các thí nghiệm cho thấy thời gian chạy của phương pháp này hiệu quả hơn so với khai thác trên cơ sở dữ liệu tập trung

    Pemanfaatan Algoritma WIT-Tree dan HITS untuk Klasifikasi Tingkat Keberhasilan Pemberdayaan Keluarga Miskin

    The successful rate of the poor families empowerment can be classified by characteristic patterns extracted from the database that contains the data of the poor families empowerment. The purpose of this research is to build a classification model to predict the level of success from poor families, who will receive assistance empowerment of poverty.   Classification models built with WARM, which is combining two methods, they are HITS and WIT-tree. HITS is used to obtained the weight of the attributes from the database. The weights are used as the attributes’s weight on methods WIT-tree. WIT-tree is used to generate the association rules that satisfy a minimum weight support and minimum weight confidence. The data used was 831 sample data poor families that divided into two classes, namely poor families in the standard of "developing" and poor families in the level of "underdeveloped".               The performance of classification model shows, weighting attribute using HITS approaches the accuracy of 86.45% and weighted attributes defined by the user approaches the accuracy of 66.13%. This study shows that the weight of the attributes obtained from HITS is better than the weight of the attributes specified by the user

    A Novel Algorithm for Mining High Utility Itemsets

