4 research outputs found
Utility-driven Data Analytics on Uncertain Data
Modern Internet of Things (IoT) applications generate massive amounts of
data, much of it in the form of objects/items of readings, events, and log
entries. Specifically, most of the objects in these IoT data contain rich
embedded information (e.g., frequency and uncertainty) and different level of
importance (e.g., unit utility of items, interestingness, cost, risk, or
weight). Many existing approaches in data mining and analytics have limitations
such as only the binary attribute is considered within a transaction, as well
as all the objects/items having equal weights or importance. To solve these
drawbacks, a novel utility-driven data analytics algorithm named HUPNU is
presented, to extract High-Utility patterns by considering both Positive and
Negative unit utilities from Uncertain data. The qualified high-utility
patterns can be effectively discovered for risk prediction, manufacturing
management, decision-making, among others. By using the developed vertical
Probability-Utility list with the Positive-and-Negative utilities structure, as
well as several effective pruning strategies. Experiments showed that the
developed HUPNU approach performed great in mining the qualified patterns
efficiently and effectively.Comment: Under review in IEEE Internet of Things Journal since 2018, 11 page
Correlated Utility-based Pattern Mining
In the field of data mining and analytics, the utility theory from Economic
can bring benefits in many real-life applications. In recent decade, a new
research field called utility-oriented mining has already attracted great
attention. Previous studies have, however, the limitation that they rarely
consider the inherent correlation of items among patterns. Consider the
purchase behaviors of consumer, a high-utility group of products (w.r.t.
multi-products) may contain several very high-utility products with some
low-utility products. However, it is considered as a valuable pattern even if
this behavior/pattern may be not highly correlated, or even happen by chance.
In this paper, in light of these challenges, we propose an efficient utility
mining approach namely non-redundant Correlated high-Utility Pattern Miner
(CoUPM) by taking positive correlation and profitable value into account. The
derived patterns with high utility and strong positive correlation can lead to
more insightful availability than those patterns only have high profitable
values. The utility-list structure is revised and applied to store necessary
information of both correlation and utility. Several pruning strategies are
further developed to improve the efficiency for discovering the desired
patterns. Experimental results show that the non-redundant correlated
high-utility patterns have more effectiveness than some other kinds of
interesting patterns. Moreover, efficiency of the proposed CoUPM algorithm
significantly outperforms the state-of-the-art algorithm.Comment: Elsevier Information Science, 15 page
HUOPM: High Utility Occupancy Pattern Mining
Mining useful patterns from varied types of databases is an important
research topic, which has many real-life applications. Most studies have
considered the frequency as sole interestingness measure for identifying high
quality patterns. However, each object is different in nature. The relative
importance of objects is not equal, in terms of criteria such as the utility,
risk, or interest. Besides, another limitation of frequent patterns is that
they generally have a low occupancy, i.e., they often represent small sets of
items in transactions containing many items, and thus may not be truly
representative of these transactions. To extract high quality patterns in real
life applications, this paper extends the occupancy measure to also assess the
utility of patterns in transaction databases. We propose an efficient algorithm
named High Utility Occupancy Pattern Mining (HUOPM). It considers user
preferences in terms of frequency, utility, and occupancy. A novel
Frequency-Utility tree (FU-tree) and two compact data structures, called the
utility-occupancy list and FU-table, are designed to provide global and partial
downward closure properties for pruning the search space. The proposed method
can efficiently discover the complete set of high quality patterns without
candidate generation. Extensive experiments have been conducted on several
datasets to evaluate the effectiveness and efficiency of the proposed
algorithm. Results show that the derived patterns are intelligible, reasonable
and acceptable, and that HUOPM with its pruning strategies outperforms the
state-of-the-art algorithm, in terms of runtime and search space, respectively.Comment: Accepted by IEEE Transactions on Cybernetics, 14 page
A Survey of Utility-Oriented Pattern Mining
The main purpose of data mining and analytics is to find novel, potentially
useful patterns that can be utilized in real-world applications to derive
beneficial knowledge. For identifying and evaluating the usefulness of
different kinds of patterns, many techniques and constraints have been
proposed, such as support, confidence, sequence order, and utility parameters
(e.g., weight, price, profit, quantity, satisfaction, etc.). In recent years,
there has been an increasing demand for utility-oriented pattern mining (UPM,
or called utility mining). UPM is a vital task, with numerous high-impact
applications, including cross-marketing, e-commerce, finance, medical, and
biomedical applications. This survey aims to provide a general, comprehensive,
and structured overview of the state-of-the-art methods of UPM. First, we
introduce an in-depth understanding of UPM, including concepts, examples, and
comparisons with related concepts. A taxonomy of the most common and
state-of-the-art approaches for mining different kinds of high-utility patterns
is presented in detail, including Apriori-based, tree-based, projection-based,
vertical-/horizontal-data-format-based, and other hybrid approaches. A
comprehensive review of advanced topics of existing high-utility pattern mining
techniques is offered, with a discussion of their pros and cons. Finally, we
present several well-known open-source software packages for UPM. We conclude
our survey with a discussion on open and practical challenges in this field.Comment: Survey paper, accepted by IEEE TKDE, 20 page