205,483 research outputs found

    Mining top-k granular association rules for recommendation

    Full text link
    Recommender systems are important for e-commerce companies as well as researchers. Recently, granular association rules have been proposed for cold-start recommendation. However, existing approaches reserve only globally strong rules; therefore some users may receive no recommendation at all. In this paper, we propose to mine the top-k granular association rules for each user. First we define three measures of granular association rules. These are the source coverage which measures the user granule size, the target coverage which measures the item granule size, and the confidence which measures the strength of the association. With the confidence measure, rules can be ranked according to their strength. Then we propose algorithms for training the recommender and suggesting items to each user. Experimental are undertaken on a publicly available data set MovieLens. Results indicate that the appropriate setting of granule can avoid over-fitting and at the same time, help obtaining high recommending accuracy.Comment: 12 pages, 5 figures, submitted to Advances in Granular Computing and Advances in Rough Sets, 2013. arXiv admin note: substantial text overlap with arXiv:1305.137

    Penemuan Pola Pergerakan Harga Saham Di Indonesia Pada Masa Pandemi COVID-19 Menggunakan Top K Association Rules Mining

    Get PDF
    Penelitian ini mengekstrak pola pergerakan harga saham saat pandemi COVID-19 berdasarkan aturan asosiasi dan banyaknya pola teratas k, yang disebut juga pola top-k. Untuk menambang pola top-k, penelitian ini menggunakan Top-K Association Rules Mining algorithm. Tujuan dari penelitian ini adalah untuk memberikan rekomendasi saham yang mudah dipahami dengan menunjukkan korelasi antara saham dengan pergerakan COVID-19. Oleh karena itu, penelitian ini juga berfokus untuk mendefinisikan tipe return dari pergerakan harga saham dan tipe selisih dari pergerakan COVID-19 berdasarkan himpunan fuzzy. Dari data pegerakan COVID-19 dan historis harga saham tahun 2020 sampai tahun 2021 dengan menggunakan Algoritma Top-K Association Rules Mining dengan nilai k = 200, pola pergerakan harga saham saat pandemi COVID-19 menunjukkan bahwa pandemi COVID-19 memengaruhi beberapa pergerakan harga saham pada 4 perusahaan dengan penjualan rugi rendah

    MINING TOP-K HIGH UTILITY ITEM SETS BY USING EFFICIENT DATA STRUCTURE TO IMPROVE THE PERFORMANCE

    Get PDF
    Association rules show strong relationship between attribute-value pairs (or items) that occur frequently in a given data set. Association rules are commonly used to determine the purchasing patterns of customers in a store. Such analysis is implemented in many decision-making processes, such as product placement, catalogue design, and cross-marketing. The discovery of association rules is based on frequent itemset mining. These frequent itemset mining algorithms mainly suffers from generation of more number of candidate itemsets and large no of database scans. These issues are addressed by two algorithms namely TKU (mining Top-K Utility itemsets) and TKO (mining Top-K utility itemsets in one phase) which are recommended for mining K- high utility itemsets in two scans of the entire database. Though scans are reduced to two, processing time is more because of UP-Tree traversals which is the data structure used by TKU and TKO algorithms.  The proposed algorithm uses B+-Tree data structure instead of UP-Tree to reduce the time. Experimental analysis clearly shows that the processing time is improved and hence limitations of existing work are overcome by proposing a methodology using B+ -Tree

    Query-Constraint-Based Mining of Association Rules for Exploratory Analysis of Clinical Datasets in the National Sleep Research Resource

    Get PDF
    Background: Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics. Methods: We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint. Results: Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules. Conclusions: QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems

    PERBANDINGAN PENCARIAN FREQUENT ITEMSET MENGGUNAKAN ALGORITMA CUT BOTH WAYS DAN ALGORITMA APRIORI COMPARISON OF FREQUENT ITEMSET GENERATION USING CUT BOTH WAYS ALGORITHM AND APRIORI ALGORITHM

    Get PDF
    ABSTRAKSI: Penggalian kaidah asosiasi (mining association rules) merupakan salah satu proses data mining untuk menemukan pola dan aturan (rule) dari sekumpulan data yang besar. Pola-pola ini merupakan kumpulan item (itemset) yang sering muncul secara bersamaan (frequent itemset) dalam transaksi pada basis data. Proses pencarian frequent itemset membutuhkan waktu yang sangat lama, oleh karena itu diperlukan suatu algoritma yang bisa mengefisiensi waktu yang dibutuhkan. Algoritma yang paling populer saat ini adalah algoritma apriori yang menggunakan support base pruning (membuang ruang pencarian dengan batasan nilai support). Algoritma ini memiliki kelemahan ketika kardinalitas pada longest frequent itemset berupa k, membutuhkan sebanyak k pembacaan basis data dan memiliki sifat computation-intensive dalam membangkitkan kandidat itemset dan penghitungan nilai support, khususnya untuk aplikasi yang memiliki nilai support yang sangat rendah dan atau item yang sangat banyak. Algoritma Cut Both Ways (CBW) menggunakan gabungan beberapa teknik dan menggunakan cutting level (?) untuk membagi ruang pencarian menjadi dua bagian. Strategi top-down untuk menemukan frequent itemset yang berada dibawah cutting level dikombinasikan dengan strategi pencarian breadth first search dan horizontal counting untuk penghitungan nilai support. Sedangkan bottom-up untuk menemukan frequent itemset yang berada diatas cutting level dikombinasikan dengan depth first search dan vertical intersection. Nilai cutting level merupakan nilai rata-rata dari kardinalitas frequent itemset. Pada tugas akhir ini akan mengimplementasikan proses pencarian frequent itemset dengan menggunakan algoritma Apriori dan CBW. Kemudian membandingkan kinerjanya dengan menggunakan beberapa parameter nilai support.Kata Kunci : mining association rules, itemset, frequent itemset, support, support base pruning, longest frequent itemset, computation-intensive, cutting level, top-down, bottom-up, breadth first search, dept first search, vertical intersection.ABSTRACT: Mining association rules is a data mining process to find rule and pattern from a large database. The pattern can be frequent itemset from the transaction of databases. Frequent itemset generation is most time-consuming process, so we need an algorithm that can be eficient a time consuming. A most popular algorithm is Arpriori which use support base pruning to prune a vast amount of non-candidate itemsets. This algorithm has disadvantages when the cardinality of longest itemset is k, apriori needs k passes of database scan, and it has. In addition, the apriori algorithm is computation-intensive in generating the candidate itemsets and counting the support values, especially for applications with very low support treshold and/or a vast amount of items. Cut Both Ways (CBW) combine a various technic and use cutting level (?) to divide a search space into two different part. Top-down strategy combined with breadth first search and horizontal counting, are used to find frequent itemset at below of the cutting level. In the other hand, bottom-up strategy combined with depth first search and vertical intersection, are used to find frequent itemset at upper of the cutting level. Cutting level is an average cardinality of frequent itemsets, expecting that most of the frequent itemsets will apear in this level. In this final project will implement frequent itemset generation using Apriori and CBW algorithm. Then, compare its performance by using different parameter of minimum support.Keyword: mining association rules, itemset, frequent itemset, support, support base pruning, longest frequent itemset, computation-intensive, cutting level, top-down, bottom-up, breadth first search, dept first search, vertical intersection

    Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees

    Full text link
    The tasks of extracting (top-KK) Frequent Itemsets (FI's) and Association Rules (AR's) are fundamental primitives in data mining and database applications. Exact algorithms for these problems exist and are widely used, but their running time is hindered by the need of scanning the entire dataset, possibly multiple times. High quality approximations of FI's and AR's are sufficient for most practical uses, and a number of recent works explored the application of sampling for fast discovery of approximate solutions to the problems. However, these works do not provide satisfactory performance guarantees on the quality of the approximation, due to the difficulty of bounding the probability of under- or over-sampling any one of an unknown number of frequent itemsets. In this work we circumvent this issue by applying the statistical concept of \emph{Vapnik-Chervonenkis (VC) dimension} to develop a novel technique for providing tight bounds on the sample size that guarantees approximation within user-specified parameters. Our technique applies both to absolute and to relative approximations of (top-KK) FI's and AR's. The resulting sample size is linearly dependent on the VC-dimension of a range space associated with the dataset to be mined. The main theoretical contribution of this work is a proof that the VC-dimension of this range space is upper bounded by an easy-to-compute characteristic quantity of the dataset which we call \emph{d-index}, and is the maximum integer dd such that the dataset contains at least dd transactions of length at least dd such that no one of them is a superset of or equal to another. We show that this bound is strict for a large class of datasets.Comment: 19 pages, 7 figures. A shorter version of this paper appeared in the proceedings of ECML PKDD 201

    Predicting student performance using data mining and learning analysis technique in Libyan Higher Education

    Get PDF
    The Technology has an increasing impact on all areas of life, including the education sector, and requires developing countries to emulate developed countries and integrate technology into their education systems. Recently schools in Libya are facing an issue trying to figure out why students perform poorly in certain subjects and how can they know how they will perform next in the future in coming semesters in perspective subject. There are several methods proposed to predict the student’s performance, using data mining techniques. In this paper, there are plans to create Data Mining Techniques in Education (i.e., DME) prediction model clustering, classification and association rule mining in many universities and schools in order to provide students and teachers with the most advanced platform. Although relatively late, the Libyan government finally responded to this challenge by investing heavily in rebuilding the education system and launching a national plan to presented method in terms of predicting students’ performance based on their grades in Math and English. The results are divided in to three main sections clustering analysis using k-mean algorithm, classification analysis was done using two rounds first using Gain Ratio Evaluations to find out the top attributes that used by J84 algorithm in second round of classification, and rule association analysis using A priori algorithm. Rule association analysis is applied for the clusters generate by clustering analysis to generate the rules associated with each cluster. For each section, a list of inputs is presented with the scale used for the values followed by the results of the algorithm and explanation for the finding

    Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

    Full text link
    Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour bread." Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of "interestingness" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare the outcome of interestingness measures applied to buying patterns in the retail industry. We report on how we used CAPA to compare 34 measures applied to over 1,800 stores of Intermarch\'e, one of the largest food retailers in France
    • …
    corecore