389 research outputs found
Efficient chain structure for high-utility sequential pattern mining
High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, which considers both utility and sequence factors to derive the set of high-utility sequential patterns (HUSPs) from the quantitative databases. Several works have been presented to reduce the computational cost by variants of pruning strategies. In this paper, we present an efficient sequence-utility (SU)-chain structure, which can be used to store more relevant information to improve mining performance. Based on the SU-Chain structure, the existing pruning strategies can also be utilized here to early prune the unpromising candidates and obtain the satisfied HUSPs. Experiments are then compared with the state-of-the-art HUSPM algorithms and the results showed that the SU-Chain-based model can efficiently improve the efficiency performance than the existing HUSPM algorithms in terms of runtime and number of the determined candidates
HUPSMT: AN EFFICIENT ALGORITHM FOR MINING HIGH UTILITY-PROBABILITY SEQUENCES IN UNCERTAIN DATABASES WITH MULTIPLE MINIMUM UTILITY THRESHOLDS
The problem of high utility sequence mining (HUSM) in quantitative se-quence databases (QSDBs) is more general than that of frequent sequence mining in se-quence databases. An important limitation of HUSM is that a user-predened minimum tility threshold is used commonly to decide if a sequence is high utility. However, this is not convincing in many real-life applications as sequences may have diferent importance. Another limitation of HUSM is that data in QSDBs are assumed to be precise. But in the real world, collected data such as by sensor maybe uncertain. Thus, this paper proposes a framework for mining high utility-probability sequences (HUPSs) in uncertain QSDBs (UQS-DBs) with multiple minimum utility thresholds using a minimum utility. Two new width and depth pruning strategies are also introduced to early eliminate low utility or low probability sequences as well as their extensions, and to reduce sets of candidate items for extensions during the mining process. Based on these strategies, a novel ecient algorithm named HUPSMT is designed for discovering HUPSs. Finally, an experimental study conducted in both real-life and synthetic UQSDBs shows the performance of HUPSMT in terms of time and memory consumption
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
- …