55 research outputs found

    Discovering High Utility Itemsets using Hybrid Approach

    Get PDF
    Mining of high utility itemsets especially from the big transactional databases is time consuming task. For mining the high utility itemsets from large transactional datasets multiple methods are available and have some consequential limitations. In case of performance these methods need to be scrutinized under low memory based systems for mining high utility itemsets from transactional datasets as well as to address further measures. The proposed algorithm combines the High Utility Pattern Mining and Incremental Frequent Pattern Mining. Two algorithms used are Apriori and existing Parallel UP Growth for mining high utility itemsets using transactional databases. The information about high utility itemsets is maintained in a data structure called UP tree. These algorithms are not only used to scans the incremental database but also collects newly generated frequent itemsets support count. It provides fast execution because it includes new itemsets in tree and removes rare itemset from a utility pattern tree structure that reduces cost and time. From various Experimental analysis and results, this hybrid approach with existing Apriori and UP-Growth is proposed with aim of improving the performance

    Extraction of High Utility Itemsets using Utility Pattern with Genetic Algorithm from OLTP System

    Get PDF
    To analyse vast amount of data, Frequent pattern mining play an important role in data mining. In practice, Frequent pattern mining cannot meet the challenges of real world problems due to items differ in various measures. Hence an emerging technique called Utility-based data mining is used in data mining processes.The utility mining not only considers the frequency but also see the utility associated with the itemsets.The main objective of utility mining is to extract the itemsets with high utilities, by considering user preferences such as profit,quantity and cost from OLTP systems. In our proposed approach, we are using UP growth with Genetic Algorithm. The idea is that UP growth algorithm would generate Potentially High Utility Itemsets and Genetic Algorithm would optimize and provide the High Utility Item set from it. On comparing with existing algorithm, the proposed approach is performing better in terms of memory utilization. DOI: 10.17762/ijritcc2321-8169.15039

    Literature Review on Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases

    Get PDF
    This paper presenting a survey on finding itemsets with high utility. For finding itemsets there are many algorithms but those algorithms having a problem of producing a large number of candidate itemsets for high utility itemsets which reduces mining performance in terms of execution. Here we mainly focus on two algorithms utility pattern growth (UP-Growth) and UP-Growth+. Those algorithms are used for mining high utility itemsets, where effective methods are used for pruning candidate itemsets. Mining high utility itemsets Keep in a special data structure called UP-Tree. This, compact tree structure, UP-Tree, is used for make possible the mining performance and avoid scanning original database repeatedly. In this for generation of candidate itemsets only two scans of database. Another proposed algorithms UP Growth+ reduces the number of candidates effectively. It also has better performance than other algorithms in terms of runtime, especially when databases contain huge amount of long transactions. Utility-based data mining is a new research area which is interested in all types of utility factors in data mining processes. In which utility factors are targeted at integrate utility considerations in both predictive and descriptive data mining tasks. High utility itemset mining is a research area of utility based descriptive data mining. Utility based data mining is used for finding itemsets that contribute most to the total utility in that database

    Determinação das regras de associação de variáveis de tempo ponderadas baseadas em utilidades mediante a aplicação de uma árvore de padrões frequentes

    Get PDF
    Introduction: The present research was conducted at Birla Institute of Technology, off Campus in Noida, India, in 2017. Methods: To assess the efficiency of the proposed approach for information mining a method and an algorithm were proposed for mining time-variant weighted, utility-based association rules using fp-tree. Results: A method is suggested to find association rules on time-oriented frequency-weighted, utility-based data, employing a hierarchy for pulling-out item-sets and establish their association. Conclusions: The dimensions adopted while developing the approach compressed a large time-variant dataset to a smaller data structure at the same time fp-tree was kept away from the repetitive dataset, which finally gave us a noteworthy advantage in articulations of time and memory use. Originality: In the current period, high utility recurrent-pattern pulling-out is one of the mainly noteworthy study areas in time-variant information mining due to its capability to account for the frequency rate of item-sets and assorted utility rates of every item-set. This research contributes to maintain it at a corresponding level, which ensures to avoid generating a big amount of candidate-sets, which ensures further development of less execution time and search spaces. Limitations: The research results demonstrated that the projected approach was efficient on tested datasets with pre-defined weight and utility calculations.Introducción: la presente investigación se realizó en el Birla Institute of Technology, fuera del campus en Noida, India, en 2017. Métodos: para evaluar la eficacia del enfoque propuesto para la minería de información, se propusieron un método y un algoritmo para minar las reglas de asociación basadas en la utilidad ponderada en el tiempo usando un árbol de patrones frecuentes (fp). Resultados: se sugiere un método para encontrar reglas de asociación en datos basados en la utilidad ponderada en frecuencia orientada al tiempo, que emplea una jerarquía para extraer conjuntos de elementos y establecer su asociación. Conclusiones: las dimensiones adoptadas al desarrollar el enfoque comprimieron un gran conjunto de datos de variante de tiempo hasta alcanzar una estructura de datos más pequeña. A su vez, el árbol fp se mantuvo alejado del conjunto de datos repetitivos, lo que finalmente generó una ventaja considerable en tiempo y uso de memoria. Originalidad: en la actualidad, la extracción de patrones recurrentes de alta utilidad es una de las áreas de estudio más desarrollada en la minería de información con respecto a la variable temporal debido a su capacidad de dar cuenta de la frecuencia de los conjuntos de elementos y las tasas de servicios varios de cada conjunto de elementos. Esta investigación contribuye a mantener el estudio sobre el tema a un buen nivel, lo que permite evitar generar una gran cantidad de conjuntos posibles, y por ende garantiza mayor desarrollo en menores tiempos de ejecución y espacios de búsqueda. Limitaciones: Los resultados de la investigación demostraron que la aproximación fue eficiente en conjuntos de datos probados con cálculos predefinidos de peso y utilidad.Introdução: esta pesquisa foi realizada no Instituto Birla de Tecnologia e Ciência, fora do campus, em Noida, na Índia, em 2017. Métodos: para avaliar a eficácia do enfoque proposto para mineração de informação, foram propostos um método e um algoritmo para minerar as regras de associação baseadas na utilidade ponderada no tempo usando uma árvore de padrões frequentes (fp).Resultados: é recomendado um método para encontrar regras de associação nos dados baseados na utilidade ponderada em frequência orientada ao tempo, que emprega uma hierarquia para extrair conjuntos de elementos e estabelecer a associação entre eles.Conclusões: as dimensões utilizadas ao desenvolver o enfoque comprimiram um grande conjunto de dados de variante de tempo até alcançar uma estrutura de dados menor, enquanto isso, a árvore fp se manteve distante do conjunto de dados repetitivos, o que finalmente gerou uma vantagem considerável em tempo e uso de memória.Originalidade: na atualidade, a extração de padrões recorrentes de alta utilidade é uma das áreas de estudo mais desenvolvidas na mineração de informação com respeito à variável temporal, devido a sua capacidade de dar conta da frequência dos conjuntos de elementos e das taxas de serviços vários de cada conjunto de elementos. Esta pesquisa ajuda a manter o estudo desse tema em um nível avançado, o que garante evitar gerar uma grande quantidade de conjuntos possíveis e, dessa forma, um maior desenvolvimento em um menor tempo de execução e espaço de busca.Limitações: os resultados da pesquisa demonstraram que a aproximação foi eficiente em conjuntos de dados provados com cálculos predefinidos de peso e utilidade

    Approximate Parallel High Utility Itemset Mining

    Get PDF
    High utility itemset mining discovers itemsets whose utility is above a given threshold, where utilities measure the importance of itemsets. In high utility itemset mining, memory and time performance limitations cause scalability issues, when the dataset is very large. In this thesis, the problem is addressed by proposing a distributed parallel algorithm, PHUI-Miner, and a sampling strategy, which can be used either separately or simultaneously. PHUI-Miner parallelizes the state-of-the-art high utility itemset mining algorithm HUI-Miner. The sampling strategy investigates the required sample size of a dataset, in order to achieve a given accuracy. We also propose an approach combining sampling with PHUI-Miner, which provides better time performance. In our experiments, we show that PHUI-Miner has high performance and outperforms the state-of-the-art non-parallel algorithm. The sampling strategy achieves accuracies much higher than the guarantee. Extensive experiments are also conducted to compare the time performance of PHUI-Miner with and without sampling

    Enhancing the Performance of Mining High Utility Itemsets Based On Pattern Algorithm

    Get PDF
    ABSTRACT: Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information. An association in data mining indicates a logical dependency between various attributes of an entity. Association rule mining (ARM) is the process of mining past data for association rules. ARM only find the frequency of itemsets, which will not provide large amount of profit. Utility mining focuses on discovering the itemsets with high sales profit. Here, utility mining is a measure of profitability of items to the users. The utility mining of itemsets is an important task in decision-making process of many applications such as website click streaming analysis, cross marketing in retail stores and in biomedical applications. The extraction of the high utility itemsets from a large database involves the creation of new candidate itemsets with high utility. This affects the performance of the mining process in terms of the execution time and the space requirement. In this paper, it is intended to develop an efficient algorithm for mining the high utility itemsets for reducing the candidate itemsets. Here, a data structure named pattern tree would be maintained to store the information about the high utility itemsets, so that the number of database scans can be reduced.

    A Novel Approach to Extract High Utility Itemsets from Distributed Databases

    Get PDF
    Traditional approaches in data mining focus on support and confidence measures which are just statistics based. Support and confidence measures which are based on the frequency count of the items enable us to derive the frequent itemsets. The frequency of the items as a single factor does not represent the interestingness of the items. To enhance the process of data mining tasks based on the value of the product, several researches were conducted. It resulted in utility mining which is an emerging field of research in data mining. In the recent years various data mining approaches have been implemented in order to find the high utility itemsets. The main objective of utility mining is to identify the itemsets with highest utilities, by considering the subjectively defined utility values, as set by the user. Existing methods based on utility mining concept focus on centralized systems where the data and associated processing is pertained to a particular location. As a further step ahead we try to implement the utility mining concept in a distributed environment. In this approach we use a sophisticated way of mining high utility itemsets using a Fast Utility Mining (FUM) algorithm
    corecore