850 research outputs found
HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach
In this paper we present a novel hybrid (arraybased layout and vertical
bitmap layout) database representation approach for mining complete Maximal
Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms
of scalability, item search order and two horizontal and vertical projection
techniques. We also present a maximal algorithm using this hybrid database
representation approach. Different experimental results on real and sparse
benchmark datasets show that our approach is better than previous state of art
maximal algorithms.Comment: 8 Pages In the proceedings of 9th IEEE-INMIC 2005, Karachi, Pakistan,
200
DiffNodesets: An Efficient Structure for Fast Mining Frequent Itemsets
Mining frequent itemsets is an essential problem in data mining and plays an
important role in many data mining applications. In recent years, some itemset
representations based on node sets have been proposed, which have shown to be
very efficient for mining frequent itemsets. In this paper, we propose
DiffNodeset, a novel and more efficient itemset representation, for mining
frequent itemsets. Based on the DiffNodeset structure, we present an efficient
algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency,
dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search
strategy and directly enumerates frequent itemsets without candidate generation
under some case. For evaluating the performance of dFIN, we have conduct
extensive experiments to compare it against with existing leading algorithms on
a variety of real and synthetic datasets. The experimental results show that
dFIN is significantly faster than these leading algorithms.Comment: 22 pages, 13 figure
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
- …