16,790 research outputs found
Incrementally updating the high average-utility patterns with pre-large concept
High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases.
Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility
as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus
providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to
efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the
past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to
re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset
(HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction
insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply
the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if
the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could
not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct
HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the
state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns
and scalability.publishedVersio
Incrementally updating the high average-utility patterns with pre-large concept
High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases.
Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility
as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus
providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to
efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the
past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to
re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset
(HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction
insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply
the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if
the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could
not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct
HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the
state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns
and scalability.publishedVersio
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Toward autonomic distributed data mining using intelligent web services.
This study defines a new approach for building a Web Services based infrastructure for distributed data mining applications. The proposed architecture provides a roadmap for autonomic functionality of the infrastructure hiding the complexity of implementation details and enabling the user with a new level of usability in data mining process. Web Services based infrastructure delivers all required data mining activities in a utility-like fashion enabling heterogeneous components to be incorporated in a unified manner. Moreover, this structure allows the implementation of data mining algorithms for processing data on more than one source in a distributed manner. The purpose of this study is to present a simple, but efficient methodology for determining when data distributed at several sites can be centralized and analyzed as data from the same theoretical distribution. This analysis also answers when and how the semantics of the sites is influenced by distribution in data. This hierarchical framework with advanced and core Web Services improves the current data mining capability significantly in terms of performance, scalability, efficiency, transparency of resources, and incremental extensibility
- …