2 research outputs found
Beyond Frequency: Utility Mining with Varied Item-Specific Minimum Utility
Utility-oriented mining which integrates utility theory and data mining is a
useful tool for understanding economic consumer behavior. Traditional
algorithms for mining high-utility patterns (HUPs) applies a single/uniform
minimum high-utility threshold (minutil) to obtain the set of HUPs, but in some
real-life circumstances, some specific products may bring lower utilities
compared with others, but their profit may offer some vital information.
However, if minutil is set high, the patterns with low minutil are missed; if
minutil is set low, the number of patterns becomes unmanageable. In this paper,
an efficient one-phase utility-oriented pattern mining algorithm, called HIMU,
is proposed for mining HUPs with varied item-specific minimum utility. A novel
tree structure called a multiple item utility set-enumeration tree (MIU-tree),
the global sorted and the conditional downward closure properties are
introduced in HIMU. In addition, we extended the compact utility-list structure
to keep the necessary information, and thus this one-phase HIMU model greatly
reduces the computational costs and memory requirements. Moreover, two pruning
strategies are then extended to enhance the performance. We conducted extensive
experiments in several synthetic and real-world datasets; the results indicates
that the designed one-phase HIMU algorithm can address the "rare item problem"
and has better performance than the state-of-the-art algorithms in terms of
runtime, memory usage, and scalability. Furthermore, the enhanced algorithms
outperform the non-optimized HIMU approach.Comment: Under review in ACM Trans. on Data Science, 31 page
A Survey of Utility-Oriented Pattern Mining
The main purpose of data mining and analytics is to find novel, potentially
useful patterns that can be utilized in real-world applications to derive
beneficial knowledge. For identifying and evaluating the usefulness of
different kinds of patterns, many techniques and constraints have been
proposed, such as support, confidence, sequence order, and utility parameters
(e.g., weight, price, profit, quantity, satisfaction, etc.). In recent years,
there has been an increasing demand for utility-oriented pattern mining (UPM,
or called utility mining). UPM is a vital task, with numerous high-impact
applications, including cross-marketing, e-commerce, finance, medical, and
biomedical applications. This survey aims to provide a general, comprehensive,
and structured overview of the state-of-the-art methods of UPM. First, we
introduce an in-depth understanding of UPM, including concepts, examples, and
comparisons with related concepts. A taxonomy of the most common and
state-of-the-art approaches for mining different kinds of high-utility patterns
is presented in detail, including Apriori-based, tree-based, projection-based,
vertical-/horizontal-data-format-based, and other hybrid approaches. A
comprehensive review of advanced topics of existing high-utility pattern mining
techniques is offered, with a discussion of their pros and cons. Finally, we
present several well-known open-source software packages for UPM. We conclude
our survey with a discussion on open and practical challenges in this field.Comment: Survey paper, accepted by IEEE TKDE, 20 page