78 research outputs found
HUPSMT: AN EFFICIENT ALGORITHM FOR MINING HIGH UTILITY-PROBABILITY SEQUENCES IN UNCERTAIN DATABASES WITH MULTIPLE MINIMUM UTILITY THRESHOLDS
The problem of high utility sequence mining (HUSM) in quantitative se-quence databases (QSDBs) is more general than that of frequent sequence mining in se-quence databases. An important limitation of HUSM is that a user-predened minimum tility threshold is used commonly to decide if a sequence is high utility. However, this is not convincing in many real-life applications as sequences may have diferent importance. Another limitation of HUSM is that data in QSDBs are assumed to be precise. But in the real world, collected data such as by sensor maybe uncertain. Thus, this paper proposes a framework for mining high utility-probability sequences (HUPSs) in uncertain QSDBs (UQS-DBs) with multiple minimum utility thresholds using a minimum utility. Two new width and depth pruning strategies are also introduced to early eliminate low utility or low probability sequences as well as their extensions, and to reduce sets of candidate items for extensions during the mining process. Based on these strategies, a novel ecient algorithm named HUPSMT is designed for discovering HUPSs. Finally, an experimental study conducted in both real-life and synthetic UQSDBs shows the performance of HUPSMT in terms of time and memory consumption
Efficient chain structure for high-utility sequential pattern mining
High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, which considers both utility and sequence factors to derive the set of high-utility sequential patterns (HUSPs) from the quantitative databases. Several works have been presented to reduce the computational cost by variants of pruning strategies. In this paper, we present an efficient sequence-utility (SU)-chain structure, which can be used to store more relevant information to improve mining performance. Based on the SU-Chain structure, the existing pruning strategies can also be utilized here to early prune the unpromising candidates and obtain the satisfied HUSPs. Experiments are then compared with the state-of-the-art HUSPM algorithms and the results showed that the SU-Chain-based model can efficiently improve the efficiency performance than the existing HUSPM algorithms in terms of runtime and number of the determined candidates
Flexible constrained sampling with guarantees for pattern mining
Pattern sampling has been proposed as a potential solution to the infamous
pattern explosion. Instead of enumerating all patterns that satisfy the
constraints, individual patterns are sampled proportional to a given quality
measure. Several sampling algorithms have been proposed, but each of them has
its limitations when it comes to 1) flexibility in terms of quality measures
and constraints that can be used, and/or 2) guarantees with respect to sampling
accuracy. We therefore present Flexics, the first flexible pattern sampler that
supports a broad class of quality measures and constraints, while providing
strong guarantees regarding sampling accuracy. To achieve this, we leverage the
perspective on pattern mining as a constraint satisfaction problem and build
upon the latest advances in sampling solutions in SAT as well as existing
pattern mining algorithms. Furthermore, the proposed algorithm is applicable to
a variety of pattern languages, which allows us to introduce and tackle the
novel task of sampling sets of patterns. We introduce and empirically evaluate
two variants of Flexics: 1) a generic variant that addresses the well-known
itemset sampling task and the novel pattern set sampling task as well as a wide
range of expressive constraints within these tasks, and 2) a specialized
variant that exploits existing frequent itemset techniques to achieve
substantial speed-ups. Experiments show that Flexics is both accurate and
efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal
(ECML/PKDD 2017 journal track
Incrementally updating the high average-utility patterns with pre-large concept
High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases.
Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility
as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus
providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to
efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the
past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to
re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset
(HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction
insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply
the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if
the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could
not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct
HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the
state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns
and scalability.publishedVersio
- …