5,382 research outputs found
Efficient Management of Short-Lived Data
Motivated by the increasing prominence of loosely-coupled systems, such as
mobile and sensor networks, which are characterised by intermittent
connectivity and volatile data, we study the tagging of data with so-called
expiration times. More specifically, when data are inserted into a database,
they may be tagged with time values indicating when they expire, i.e., when
they are regarded as stale or invalid and thus are no longer considered part of
the database. In a number of applications, expiration times are known and can
be assigned at insertion time. We present data structures and algorithms for
online management of data tagged with expiration times. The algorithms are
based on fully functional, persistent treaps, which are a combination of binary
search trees with respect to a primary attribute and heaps with respect to a
secondary attribute. The primary attribute implements primary keys, and the
secondary attribute stores expiration times in a minimum heap, thus keeping a
priority queue of tuples to expire. A detailed and comprehensive experimental
study demonstrates the well-behavedness and scalability of the approach as well
as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl
Improving Efficiency of Incremental Mining by Trie Structure and Pre-Large Itemsets
Incremental data mining has been discussed widely in recent years, as it has many practical applications, and various incremental mining algorithms have been proposed. Hong et al. proposed an efficient incremental mining algorithm for handling newly inserted transactions by using the concept of pre-large itemsets. The algorithm aimed to reduce the need to rescan the original database and also cut maintenance costs. Recently, Lin et al. proposed the Pre-FUFP algorithm to handle new transactions more efficiently, and make it easier to update the FP-tree. However, frequent itemsets must be mined from the FP-growth algorithm. In this paper, we propose a Pre-FUT algorithm (Fast-Update algorithm using the Trie data structure and the concept of pre-large itemsets), which not only builds and updates the trie structure when new transactions are inserted, but also mines all the frequent itemsets easily from the tree. Experimental results show the good performance of the proposed algorithm
A GA-Based Approach to Hide Sensitive High Utility Itemsets
A GA-based privacy preserving utility mining method is proposed to find appropriate transactions to be inserted into the database for hiding sensitive high utility itemsets. It maintains the low information loss while providing information to the data demanders and protects the high-risk information in the database. A flexible evaluation function with three factors is designed in the proposed approach to evaluate whether the processed transactions are required to be inserted. Three different weights are, respectively, assigned to the three factors according to users. Moreover, the downward closure property and the prelarge concept are adopted in the proposed approach to reduce the cost of rescanning database, thus speeding up the evaluation process of chromosomes
Enhanced PL-WAP tree method for incremental mining of sequential patterns.
Sequential mining as web usage mining has been used in improving web site design, increasing volume of e-business and providing marketing decision support. This thesis proposes PL4UP and EPL4UP algorithms which use the PLWAP tree structure to incrementally update sequential patterns. PL4UP does not scan old DB except when previous small 1-itemsets become large in updated database during which time its scans only all transactions in the old database that contain any small itemsets. EPL4UP rebuilds the old PLWAP tree using only the list of previous small itemsets once rather than scanning the entire old database twice like original PLWAP. PL4UP and EPL4UP first update old frequent patterns on the small PLWAP tree built for only the incremented part of the database, then they compare new added patterns generated from the small tree with the old frequent patterns to reduce the number of patterns to be checked on the old PLWAP tree. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .C47. Source: Masters Abstracts International, Volume: 42-03, page: 0959. Adviser: Christie Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2003
- …