Search CORE

10 research outputs found

Literature Review on Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases

Author: Ketkee Kailas Gaikwad, Mininath Nighot
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2014
Field of study

This paper presenting a survey on finding itemsets with high utility. For finding itemsets there are many algorithms but those algorithms having a problem of producing a large number of candidate itemsets for high utility itemsets which reduces mining performance in terms of execution. Here we mainly focus on two algorithms utility pattern growth (UP-Growth) and UP-Growth+. Those algorithms are used for mining high utility itemsets, where effective methods are used for pruning candidate itemsets. Mining high utility itemsets Keep in a special data structure called UP-Tree. This, compact tree structure, UP-Tree, is used for make possible the mining performance and avoid scanning original database repeatedly. In this for generation of candidate itemsets only two scans of database. Another proposed algorithms UP Growth+ reduces the number of candidates effectively. It also has better performance than other algorithms in terms of runtime, especially when databases contain huge amount of long transactions. Utility-based data mining is a new research area which is interested in all types of utility factors in data mining processes. In which utility factors are targeted at integrate utility considerations in both predictive and descriptive data mining tasks. High utility itemset mining is a research area of utility based descriptive data mining. Utility based data mining is used for finding itemsets that contribute most to the total utility in that database

International Journal on Recent and Innovation Trends in Computing and Communication

Extraction of High Utility Itemsets using Utility Pattern with Genetic Algorithm from OLTP System

Author: A. Saranya, D. Kerana Hanirex
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/03/2015
Field of study

To analyse vast amount of data, Frequent pattern mining play an important role in data mining. In practice, Frequent pattern mining cannot meet the challenges of real world problems due to items differ in various measures. Hence an emerging technique called Utility-based data mining is used in data mining processes.The utility mining not only considers the frequency but also see the utility associated with the itemsets.The main objective of utility mining is to extract the itemsets with high utilities, by considering user preferences such as profit,quantity and cost from OLTP systems. In our proposed approach, we are using UP growth with Genetic Algorithm. The idea is that UP growth algorithm would generate Potentially High Utility Itemsets and Genetic Algorithm would optimize and provide the High Utility Item set from it. On comparing with existing algorithm, the proposed approach is performing better in terms of memory utilization. DOI: 10.17762/ijritcc2321-8169.15039

International Journal on Recent and Innovation Trends in Computing and Communication

Discovering High Utility Itemsets using Hybrid Approach

Author: Amol J. Gosavi, Dr. K. V. Metre
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/10/2015
Field of study

Mining of high utility itemsets especially from the big transactional databases is time consuming task. For mining the high utility itemsets from large transactional datasets multiple methods are available and have some consequential limitations. In case of performance these methods need to be scrutinized under low memory based systems for mining high utility itemsets from transactional datasets as well as to address further measures. The proposed algorithm combines the High Utility Pattern Mining and Incremental Frequent Pattern Mining. Two algorithms used are Apriori and existing Parallel UP Growth for mining high utility itemsets using transactional databases. The information about high utility itemsets is maintained in a data structure called UP tree. These algorithms are not only used to scans the incremental database but also collects newly generated frequent itemsets support count. It provides fast execution because it includes new itemsets in tree and removes rare itemset from a utility pattern tree structure that reduces cost and time. From various Experimental analysis and results, this hybrid approach with existing Apriori and UP-Growth is proposed with aim of improving the performance

International Journal on Recent and Innovation Trends in Computing and Communication

A Clustering based Prediction Scheme for High Utility Itemsets

Author: Lakhawat Piyush
Somani Arun
Somani Arun
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2017
Field of study

We strongly believe that the current Utility Itemset Mining (UIM) problem model can be extended with a key modeling capability of predicting future itemsets based on prior knowledge of clusters in the dataset. Information in transactions fairly representative of a cluster type is more a characteristic of the cluster type than the the entire data. Subjecting such transactions to the common threshold in the UIM problem leads to information loss. We identify that an implicit use of the cluster structure of data in the UIM problem model will address this limitation. We achieve this by introducing a new clustering based utility in the definition of the UIM problem model and modifying the definitions of absolute utilities based on it. This enhances the UIM model by including a predictive aspect to it, thereby enabling the cluster specific patterns to emerge while still mining the inter-cluster patterns. By performing experiments on two real data sets we are able to verify that our proposed predictive UIM problem model extracts more useful information than the current UIM model with high accuracy

Digital Repository @ Iowa State University (ISU)

PHUIMUS: A Potential High Utility Itemsets Mining Algorithm Based on Stream Data with Uncertainty

Author: Chunjie Jin
Fuxian Liu
Ju Wang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

High utility itemsets (HUIs) mining has been a hot topic recently, which can be used to mine the profitable itemsets by considering both the quantity and profit factors. Up to now, researches on HUIs mining over uncertain datasets and data stream had been studied respectively. However, to the best of our knowledge, the issue of HUIs mining over uncertain data stream is seldom studied. In this paper, PHUIMUS (potential high utility itemsets mining over uncertain data stream) algorithm is proposed to mine potential high utility itemsets (PHUIs) that represent the itemsets with high utilities and high existential probabilities over uncertain data stream based on sliding windows. To realize the algorithm, potential utility list over uncertain data stream (PUS-list) is designed to mine PHUIs without rescanning the analyzed uncertain data stream. And transaction weighted probability and utility tree (TWPUS-tree) over uncertain data stream is also designed to decrease the number of candidate itemsets generated by the PHUIMUS algorithm. Substantial experiments are conducted in terms of run-time, number of discovered PHUIs, memory consumption, and scalability on real-life and synthetic databases. The results show that our proposed algorithm is reasonable and acceptable for mining meaningful PHUIs from uncertain data streams

Crossref

Directory of Open Access Journals

Knowledge discovery techniques for transactional data model

Author: Lakhawat Piyush
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2018
Field of study

In this work we give solutions to two key knowledge discovery problems for the Transactional Data model: Cluster analysis and Itemset mining. By knowledge discovery in context of these two problems, we specifically mean novel and useful ways of extracting clusters and itemsets from transactional data. Transactional Data model is widely used in a variety of applications. In cluster analysis the goal is to find clusters of similar transactions in the data with the collective properties of each cluster being unique. We propose the first clustering algorithm for transactional data which uses the latest model definition. All previously proposed algorithms did not use the important utility information in the data. Our novel technique effectively solves this problem. We also propose two new cluster validation metrics based on the criterion of high utility patterns. When comparing our technique with competing algorithms, we miss much fewer high utility patterns of importance than them. Itemset mining is the problem of searching for repeating patterns of high importance in the data. We show that the current model for itemset mining leads to information loss. It ignores the presence of clusters in the data. We propose a new itemset mining model which incorporates the cluster structure information. This allows the model to make predictions for future itemsets. We show that our model makes accurate predictions successfully, by discovering 30-40% future itemsets in most experiments on two benchmark datasets with negligible inaccuracies. There are no other present itemset prediction models, so accurate prediction is an accomplishment of ours. We provide further theoretical improvements in our model by making it capable of giving predictions for specific future windows by using time series forecasting. We also perform a detailed analysis of various clustering algorithms and study the effect of the Big Data phenomenon on them. This inspired us to further refine our model based on a classification problem design. This addition allows the mining of itemsets based on maximizing a customizable objective function made of different prediction metrics. The final framework design proposed by us is the first of its kind to make itemset predictions by using the cluster structure. It is capable of adapting the predictions to a specific future window and customizes the mining process to any specified prediction criterion. We create an implementation of the framework on a Web analytics data set, and notice that it successfully makes optimal prediction configuration choices with a high accuracy of 0.895

Digital Repository @ Iowa State University (ISU)