205,483 research outputs found
Mining top-k granular association rules for recommendation
Recommender systems are important for e-commerce companies as well as
researchers. Recently, granular association rules have been proposed for
cold-start recommendation. However, existing approaches reserve only globally
strong rules; therefore some users may receive no recommendation at all. In
this paper, we propose to mine the top-k granular association rules for each
user. First we define three measures of granular association rules. These are
the source coverage which measures the user granule size, the target coverage
which measures the item granule size, and the confidence which measures the
strength of the association. With the confidence measure, rules can be ranked
according to their strength. Then we propose algorithms for training the
recommender and suggesting items to each user. Experimental are undertaken on a
publicly available data set MovieLens. Results indicate that the appropriate
setting of granule can avoid over-fitting and at the same time, help obtaining
high recommending accuracy.Comment: 12 pages, 5 figures, submitted to Advances in Granular Computing and
Advances in Rough Sets, 2013. arXiv admin note: substantial text overlap with
arXiv:1305.137
Penemuan Pola Pergerakan Harga Saham Di Indonesia Pada Masa Pandemi COVID-19 Menggunakan Top K Association Rules Mining
Penelitian ini mengekstrak pola pergerakan harga saham saat pandemi COVID-19 berdasarkan aturan asosiasi dan banyaknya pola teratas k, yang disebut juga pola top-k. Untuk menambang pola top-k, penelitian ini menggunakan Top-K Association Rules Mining algorithm. Tujuan dari penelitian ini adalah untuk memberikan rekomendasi saham yang mudah dipahami dengan menunjukkan korelasi antara saham dengan pergerakan COVID-19. Oleh karena itu, penelitian ini juga berfokus untuk mendefinisikan tipe return dari pergerakan harga saham dan tipe selisih dari pergerakan COVID-19 berdasarkan himpunan fuzzy. Dari data pegerakan COVID-19 dan historis harga saham tahun 2020 sampai tahun 2021 dengan menggunakan Algoritma Top-K Association Rules Mining dengan nilai k = 200, pola pergerakan harga saham saat pandemi COVID-19 menunjukkan bahwa pandemi COVID-19 memengaruhi beberapa pergerakan harga saham pada 4 perusahaan dengan penjualan rugi rendah
MINING TOP-K HIGH UTILITY ITEM SETS BY USING EFFICIENT DATA STRUCTURE TO IMPROVE THE PERFORMANCE
Association rules show strong relationship between attribute-value pairs (or items) that occur frequently in a given data set. Association rules are commonly used to determine the purchasing patterns of customers in a store. Such analysis is implemented in many decision-making processes, such as product placement, catalogue design, and cross-marketing. The discovery of association rules is based on frequent itemset mining. These frequent itemset mining algorithms mainly suffers from generation of more number of candidate itemsets and large no of database scans. These issues are addressed by two algorithms namely TKU (mining Top-K Utility itemsets) and TKO (mining Top-K utility itemsets in one phase) which are recommended for mining K- high utility itemsets in two scans of the entire database. Though scans are reduced to two, processing time is more because of UP-Tree traversals which is the data structure used by TKU and TKO algorithms. The proposed algorithm uses B+-Tree data structure instead of UP-Tree to reduce the time. Experimental analysis clearly shows that the processing time is improved and hence limitations of existing work are overcome by proposing a methodology using B+ -Tree
Query-Constraint-Based Mining of Association Rules for Exploratory Analysis of Clinical Datasets in the National Sleep Research Resource
Background: Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics.
Methods: We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint.
Results: Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules.
Conclusions: QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems
PERBANDINGAN PENCARIAN FREQUENT ITEMSET MENGGUNAKAN ALGORITMA CUT BOTH WAYS DAN ALGORITMA APRIORI COMPARISON OF FREQUENT ITEMSET GENERATION USING CUT BOTH WAYS ALGORITHM AND APRIORI ALGORITHM
ABSTRAKSI: Penggalian kaidah asosiasi (mining association rules) merupakan salah satu proses data mining untuk menemukan pola dan aturan (rule) dari sekumpulan data yang besar. Pola-pola ini merupakan kumpulan item (itemset) yang sering muncul secara bersamaan (frequent itemset) dalam transaksi pada basis data. Proses pencarian frequent itemset membutuhkan waktu yang sangat lama, oleh karena itu diperlukan suatu algoritma yang bisa mengefisiensi waktu yang dibutuhkan. Algoritma yang paling populer saat ini adalah algoritma apriori yang menggunakan support base pruning (membuang ruang pencarian dengan batasan nilai support). Algoritma ini memiliki kelemahan ketika kardinalitas pada longest frequent itemset berupa k, membutuhkan sebanyak k pembacaan basis data dan memiliki sifat computation-intensive dalam membangkitkan kandidat itemset dan penghitungan nilai support, khususnya untuk aplikasi yang memiliki nilai support yang sangat rendah dan atau item yang sangat banyak. Algoritma Cut Both Ways (CBW) menggunakan gabungan beberapa teknik dan menggunakan cutting level (?) untuk membagi ruang pencarian menjadi dua bagian. Strategi top-down untuk menemukan frequent itemset yang berada dibawah cutting level dikombinasikan dengan strategi pencarian breadth first search dan horizontal counting untuk penghitungan nilai support. Sedangkan bottom-up untuk menemukan frequent itemset yang berada diatas cutting level dikombinasikan dengan depth first search dan vertical intersection. Nilai cutting level merupakan nilai rata-rata dari kardinalitas frequent itemset. Pada tugas akhir ini akan mengimplementasikan proses pencarian frequent itemset dengan menggunakan algoritma Apriori dan CBW. Kemudian membandingkan kinerjanya dengan menggunakan beberapa parameter nilai support.Kata Kunci : mining association rules, itemset, frequent itemset, support, support base pruning, longest frequent itemset, computation-intensive, cutting level, top-down, bottom-up, breadth first search, dept first search, vertical intersection.ABSTRACT: Mining association rules is a data mining process to find rule and pattern from a large database. The pattern can be frequent itemset from the transaction of databases. Frequent itemset generation is most time-consuming process, so we need an algorithm that can be eficient a time consuming. A most popular algorithm is Arpriori which use support base pruning to prune a vast amount of non-candidate itemsets. This algorithm has disadvantages when the cardinality of longest itemset is k, apriori needs k passes of database scan, and it has. In addition, the apriori algorithm is computation-intensive in generating the candidate itemsets and counting the support values, especially for applications with very low support treshold and/or a vast amount of items. Cut Both Ways (CBW) combine a various technic and use cutting level (?) to divide a search space into two different part. Top-down strategy combined with breadth first search and horizontal counting, are used to find frequent itemset at below of the cutting level. In the other hand, bottom-up strategy combined with depth first search and vertical intersection, are used to find frequent itemset at upper of the cutting level. Cutting level is an average cardinality of frequent itemsets, expecting that most of the frequent itemsets will apear in this level. In this final project will implement frequent itemset generation using Apriori and CBW algorithm. Then, compare its performance by using different parameter of minimum support.Keyword: mining association rules, itemset, frequent itemset, support, support base pruning, longest frequent itemset, computation-intensive, cutting level, top-down, bottom-up, breadth first search, dept first search, vertical intersection
Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees
The tasks of extracting (top-) Frequent Itemsets (FI's) and Association
Rules (AR's) are fundamental primitives in data mining and database
applications. Exact algorithms for these problems exist and are widely used,
but their running time is hindered by the need of scanning the entire dataset,
possibly multiple times. High quality approximations of FI's and AR's are
sufficient for most practical uses, and a number of recent works explored the
application of sampling for fast discovery of approximate solutions to the
problems. However, these works do not provide satisfactory performance
guarantees on the quality of the approximation, due to the difficulty of
bounding the probability of under- or over-sampling any one of an unknown
number of frequent itemsets. In this work we circumvent this issue by applying
the statistical concept of \emph{Vapnik-Chervonenkis (VC) dimension} to develop
a novel technique for providing tight bounds on the sample size that guarantees
approximation within user-specified parameters. Our technique applies both to
absolute and to relative approximations of (top-) FI's and AR's. The
resulting sample size is linearly dependent on the VC-dimension of a range
space associated with the dataset to be mined. The main theoretical
contribution of this work is a proof that the VC-dimension of this range space
is upper bounded by an easy-to-compute characteristic quantity of the dataset
which we call \emph{d-index}, and is the maximum integer such that the
dataset contains at least transactions of length at least such that no
one of them is a superset of or equal to another. We show that this bound is
strict for a large class of datasets.Comment: 19 pages, 7 figures. A shorter version of this paper appeared in the
proceedings of ECML PKDD 201
Predicting student performance using data mining and learning analysis technique in Libyan Higher Education
The Technology has an increasing impact on all areas of life, including the education sector, and requires developing countries to emulate developed countries and integrate technology into their education systems. Recently schools in Libya are facing an issue trying to figure out why students perform poorly in certain subjects and how can they know how they will perform next in the future in coming semesters in perspective subject. There are several methods proposed to predict the student’s performance, using data mining techniques. In this paper, there are plans to create Data Mining Techniques in Education (i.e., DME) prediction model clustering, classification and association rule mining in many universities and schools in order to provide students and teachers with the most advanced platform. Although relatively late, the Libyan government finally responded to this challenge by investing heavily in rebuilding the education system and launching a national plan to presented method in terms of predicting students’ performance based on their grades in Math and English. The results are divided in to three main sections clustering analysis using k-mean algorithm, classification analysis was done using two rounds first using Gain Ratio Evaluations to find out the top attributes that used by J84 algorithm in second round of classification, and rule association analysis using A priori algorithm. Rule association analysis is applied for the clusters generate by clustering analysis to generate the rules associated with each cluster. For each section, a list of inputs is presented with the scale used for the values followed by the results of the algorithm and explanation for the finding
Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns
Understanding customer buying patterns is of great interest to the retail
industry and has shown to benefit a wide variety of goals ranging from managing
stocks to implementing loyalty programs. Association rule mining is a common
technique for extracting correlations such as "people in the South of France
buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour
bread." Unfortunately, sifting through a high number of buying patterns is not
useful in practice, because of the predominance of popular products in the top
rules. As a result, a number of "interestingness" measures (over 30) have been
proposed to rank rules. However, there is no agreement on which measures are
more appropriate for retail data. Moreover, since pattern mining algorithms
output thousands of association rules for each product, the ability for an
analyst to rely on ranking measures to identify the most interesting ones is
crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a
framework that provides analysts with the ability to compare the outcome of
interestingness measures applied to buying patterns in the retail industry. We
report on how we used CAPA to compare 34 measures applied to over 1,800 stores
of Intermarch\'e, one of the largest food retailers in France
- …