134 research outputs found
The Design and Implementation of Collaborative Filtering in Data Mining
Data mining is the process of discovering explicit knowledge from large amounts of data stored in database, data warehouse or other repositories. There have been many studies about models of data mining such as association rule, sequential pattern and so on. Collaborative filtering is one of data mining models. In this paper, we propose two approaches to solving the mining process of collaborative filtering. Finally, collaborative filtering mining is applied to Knowledge Management system
Optical tomography: Image improvement using mixed projection of parallel and fan beam modes
Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be defined by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The findings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam
Specious rules: an efficient and effective unifying method for removing misleading and uninformative patterns in association rule mining
We present theoretical analysis and a suite of tests and procedures for
addressing a broad class of redundant and misleading association rules we call
\emph{specious rules}. Specious dependencies, also known as \emph{spurious},
\emph{apparent}, or \emph{illusory associations}, refer to a well-known
phenomenon where marginal dependencies are merely products of interactions with
other variables and disappear when conditioned on those variables.
The most extreme example is Yule-Simpson's paradox where two variables
present positive dependence in the marginal contingency table but negative in
all partial tables defined by different levels of a confounding factor. It is
accepted wisdom that in data of any nontrivial dimensionality it is infeasible
to control for all of the exponentially many possible confounds of this nature.
In this paper, we consider the problem of specious dependencies in the context
of statistical association rule mining. We define specious rules and show they
offer a unifying framework which covers many types of previously proposed
redundant or misleading association rules. After theoretical analysis, we
introduce practical algorithms for detecting and pruning out specious
association rules efficiently under many key goodness measures, including
mutual information and exact hypergeometric probabilities. We demonstrate that
the procedure greatly reduces the number of associations discovered, providing
an elegant and effective solution to the problem of association mining
discovering large numbers of misleading and redundant rules.Comment: Note: This is a corrected version of the paper published in SDM'17.
In the equation on page 4, the range of the sum has been correcte
FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking
Maximal frequent patterns superset checking plays an important role in the
efficient mining of complete Maximal Frequent Itemsets (MFI) and maximal search
space pruning. In this paper we present a new indexing approach, FastLMFI for
local maximal frequent patterns (itemset) propagation and maximal patterns
superset checking. Experimental results on different sparse and dense datasets
show that our work is better than the previous well known progressive focusing
technique. We have also integrated our superset checking approach with an
existing state of the art maximal itemsets algorithm Mafia, and compare our
results with current best maximal itemsets algorithms afopt-max and FP
(zhu)-max. Our results outperform afopt-max and FP (zhu)-max on dense (chess
and mushroom) datasets on almost all support thresholds, which shows the
effectiveness of our approach.Comment: 8 Pages, In the proceedings of 4th ACS/IEEE International Conference
on Computer Systems and Applications 2006, March 8, 2006, Dubai/Sharjah, UAE,
2006, Page(s) 452-45
Towards scalable algorithm for closed itemset mining in high-dimensional data
Mining frequent itemsets from large dataset has a major drawback in which the explosive number of itemsets requires additional mining process which might filter the interesting ones. Therefore, as the solution, the concept of closed frequent itemset was introduced that is lossless and condensed representation of all the frequent itemsets and their corresponding supports. Unfortunately, many algorithms are not memory-efficient since it requires the storage of closed itemsets in main memory for duplication checks. This paper presents BFF, a scalable algorithm for discovering closed frequent itemsets from high-dimensional data. Unlike many well-known algorithms, BFF traverses the search tree in breadth-first manner resulted to a minimum use of memory and less running time. The tests conducted on a number of microarray datasets show that the performance of this algorithm improved significantly as the support threshold decreases which is crucial in generating more interesting rules
- …