Search CORE

484 research outputs found

Mining very long sequences with PLWAPLong algorithms

Author: Saeed Kashif
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2009
Field of study

Sequential pattern mining is the process of finding inter-transaction frequent sequential patterns from a sequential database, where records consist of ordered sets of events (or items), by applying data mining techniques on such sequential databases. Discovering sequential patterns in web server logs is an example application of sequential mining, which is useful for predicting visiting patterns of web users for such purposes as targeted advertisements. Position Coded Pre-order Linked Web Access Pattern (PLWAP) mining algorithm is one of the existing efficient web sequential pattern mining algorithms, which stores the frequently stored sequences of the entire sequential database in a compressed tree form with position coded nodes. However, for very long sequences exceeding thirty two nodes, the number of bits an integer position code can hold, the PLWAP algorithm\u27s performance begins to degrade because it employs linked lists to store conjunctions of long position codes and the linked list traversals slow down the algorithm both during tree construction and mining. PLWAP algorithm also uses each and every node in the frequent 1-item event queue to test for that event inclusion in the suffix tree root set during mining. This is a very expensive operation since except for one node all other nodes that are its ancestors and descendents are not included in the root set. This thesis proposes two new algorithms, i.e. PLWAPLong1 and PLWAPLong2. Both of these new algorithms use a new position code numbering scheme where each node is assigned two numeric variables (startPosition, endPosition) instead of one. Using this scheme we can determine the ancestor node in O (1) operation by comparing the startPosition and endPosition of two nodes. PLWAPLong1 algorithm also proposes transforming the linked list based tree to an equivalent array representation and using binary search to find the immediate descendant in a suffix tree. PLWAPLong2 uses existing linked list based tree. Both PLWAPLong1 and PLWAPLong2 algorithms introduce a new technique called Last Descendant to eliminate the unwanted nodes from ancestor/descendent test when creating the suffix tree root set. Keywords: Data mining, Web Mining, Association Rule Mining, Long Sequences, PLWAP Minin

Scholarship at UWindsor

iWAP: ASingle Pass Approach for Web Access Sequential Pattern Mining

Author: . Byeong-Soo Jeong
. Chowdhury Farhan Ahmed
. Nafisah Islam
. Tarannum Shaila Zaman
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 28/08/2014
Field of study

With the explosive growth of data availability on the World Wide Web, web usage mining becomes very essential for improving designs of websites, analyzing system performance as well as network communications, understanding user reaction, motivation and building adaptive websites. Web Access Pattern mining (WAP-mine) is a sequential pattern mining technique for discovering frequent web log access sequences. It first stores the frequent part of original web access sequence database on a prefix tree called WAP-tree and mines the frequent sequences from that tree according to a user given minimum support threshold. Therefore, this method is not applicable for incremental and interactive mining. In this paper, we propose an algorithm, improved Web Access Pattern (iWAP) mining, to find web access patterns from web logs more efficiently than the WAP-mine algorithm. Our proposed approach can discover all web access sequential patterns with a single pass of web log databases. Moreover, it is applicable for interactive and incremental mining which are not provided by the earlier one. The experimental and performance studies show that the proposed algorithm is in general an order of magnitude faster than the existing WAP-mine algorithm

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

Mining of uncertain Web log sequences with access history probabilities

Author: Kadri Olalekan Habeeb
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2010
Field of study

An uncertain data sequence is a sequence of data that exist with some level of doubt or probability. Each data item in the uncertain sequence is represented with a label and probability values, referred to as existential probability, ranging from 0 to 1. Existing algorithms are either unsuitable or inefficient for discovering frequent sequences in uncertain data. This thesis presents mining of uncertain Web sequences with a method that combines access history probabilities from several Web log sessions with features of the PLWAP web sequential miner. The method is Uncertain Position Coded Pre-order Linked Web Access Pattern (U-PLWAP) algorithm for mining frequent sequential patterns in uncertain web logs. While PLWAP only considers a session of weblogs, U-PLWAP takes more sessions of weblogs from which existential probabilities are generated. Experiments show that U-PLWAP is at least 100% faster than U-apriori, and 33% faster than UF-growth. The UF-growth algorithm also fails to take into consideration the order of the items, thereby making U-PLWAP a richer algorithm in terms of the information its result contains

Scholarship at UWindsor

Mining High Utility Sequential Patterns from Uncertain Web Access Sequences using the PL-WAP

Author: Vangala Sravya
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2017
Field of study

In general, the web access patterns are retrieved from the web access sequence databases using various sequential pattern algorithms such as GSP, WAP, and PLWAP tree. However, these algorithms do not consider sequential data with quantity (internal utility) (e.g., the amount of the time spent by the user on a web page) and quality (external utility) (e.g., the rating of a web page in a website) information. These algorithms also do not work on uncertain sequential items (e.g., purchased products) having probability (0, 1). Factoring in the utility and uncertainty of each sequence item provides more product information that can be beneficial in mining profitable patterns from company’s websites. For example, a customer can purchase a bottle of ink more frequently than a printer but the purchase of a single printer can yield more profit to the business owner than the purchase of multiple bottles of ink. Most existing traditional uncertain sequential pattern algorithms such as U-Apriori, UF-Growth, and U-PLWAP do not include the utility measures. In U-PLWAP, the web sequences are derived from web log data without including the time spent by the user and the web pages are not associated with any rating. By considering these two utilities, sometimes the items with lower existential probability can be more profitable to the website owner. In utility based traditional algorithms, the only algorithm related to both uncertain and high utility is the PHUI-UP algorithm which considers the probability and utility as different entities and the retrieved patterns are not dependent with both due to two different thresholds, and it does not mine uncertain web access database sequences. This thesis proposes the algorithm HUU-PLWAP miner for mining uncertain sequential patterns with internal and external utility information using PLWAP tree approach that cut down on several database scans of level-wise approaches. HUU-PLWAP uses uncertain internal utility values (derived from sequence uncertainty model) and the constant external utility values (predefined) to retrieve the high utility sequential patterns from uncertain web access sequence databases with the help of U-PLWAP methodology. Experiments show that HUU-PLWAP is at least 95% faster than U-PLWAP, and 75% faster than the PHUI-UP algorithm

Scholarship at UWindsor

Mining frequent sequential patterns in data streams using SSM-algorithm.

Author: Monwar Mostafa
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

Frequent sequential mining is the process of discovering frequent sequential patterns in data sequences as found in applications like web log access sequences. In data stream applications, data arrive at high speed rates in a continuous flow. Data stream mining is an online process different from traditional mining. Traditional mining algorithms work on an entire static dataset in order to obtain results while data stream mining algorithms work with continuously arriving data streams. With rapid change in technology, there are many applications that take data as continuous streams. Examples include stock tickers, network traffic measurements, click stream data, data feeds from sensor networks, and telecom call records. Mining frequent sequential patterns on data stream applications contend with many challenges such as limited memory for unlimited data, inability of algorithms to scan infinitely flowing original dataset more than once and to deliver current and accurate result on demand. This thesis proposes SSM-Algorithm (sequential stream mining-algorithm) that delivers frequent sequential patterns in data streams. The concept of this work came from FP-Stream algorithm that delivers time sensitive frequent patterns. Proposed SSM-Algorithm outperforms FP-Stream algorithm by the use of a hash based and two efficient tree based data structures. All incoming streams are handled dynamically to improve memory usage. SSM-Algorithm maintains frequent sequences incrementally and delivers most current result on demand. The introduced algorithm can be deployed to analyze e-commerce data where the primary source of the data is click stream data. (Abstract shortened by UMI.)Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .M668. Source: Masters Abstracts International, Volume: 44-03, page: 1409. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

Enhanced PL-WAP tree method for incremental mining of sequential patterns.

Author: Chen Min
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2003
Field of study

Sequential mining as web usage mining has been used in improving web site design, increasing volume of e-business and providing marketing decision support. This thesis proposes PL4UP and EPL4UP algorithms which use the PLWAP tree structure to incrementally update sequential patterns. PL4UP does not scan old DB except when previous small 1-itemsets become large in updated database during which time its scans only all transactions in the old database that contain any small itemsets. EPL4UP rebuilds the old PLWAP tree using only the list of previous small itemsets once rather than scanning the entire old database twice like original PLWAP. PL4UP and EPL4UP first update old frequent patterns on the small PLWAP tree built for only the incremented part of the database, then they compare new added patterns generated from the small tree with the old frequent patterns to reduce the number of patterns to be checked on the old PLWAP tree. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .C47. Source: Masters Abstracts International, Volume: 42-03, page: 0959. Adviser: Christie Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2003

Scholarship at UWindsor

Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree

Author: C.I. Ezeife
Yi Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mining Multiple Related Tables Using Object-Oriented Model

Author: Zhang Dan
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2011
Field of study

An object-oriented database is represented by a set of classes connected by their class inheritance hierarchy through superclass and subclass relationships. An object-oriented database is suitable for capturing more details and complexity for real world data. Existing algorithms for mining multiple databases are either Apriori-based or machine learning techniques, but are not suitable for mining multiple object-oriented databases. This thesis proposes an object-oriented class model and database schema, and a series of class methods including that for object-oriented join ( OOJoin) which joins superclass and subclass tables by matching their type and super type relationships, mining Hierarchical Frequent Patterns ( MineHFPs) from multiple integrated databases by applying an extended TidFP technique which specifies the class hierarchy by traversing the multiple database inheritance hierarchy. This thesis also extends map-gen join method used in TidFP algorithm to oomap-gen join for generating k-itemset candidate pattern to reduce the candidate itemset generation by indexing the (k-1)-itemset candidate pattern using two position codes of start position and end position codes tied to inheritance hierarchy level. Experiments show that the proposed MineHFPs algorithm for mining hierarchical frequent patterns is more effective and efficient for complex queries

Scholarship at UWindsor

Transaction-filtering data mining and a predictive model for intelligent data management

Author: Liao ChenHan
Wang Frank ZhiGang
Publication venue
Publication date: 01/01/2008
Field of study

This thesis, first of all, proposes a new data mining paradigm (transaction-filtering association rule mining) addressing a time consumption issue caused by the repeated scans of original transaction databases in conventional associate rule mining algorithms. An in-memory transaction filter is designed to discard those infrequent items in the pruning steps. This filter is a data structure to be updated at the end of each iteration. The results based on an IBM benchmark show that an execution time reduction of 10% - 19% is achieved compared with the base case. Next, a data mining-based predictive model is then established contributing to intelligent data management within the context of Centre for Grid Computing. The capability of discovering unseen rules, patterns and correlations enables data mining techniques favourable in areas where massive amounts of data are generated. The past behaviours of two typical scenarios (network file systems and Data Grids) have been analyzed to build the model. The future popularity of files can be forecasted with an accuracy of 90% by deploying the above predictor based on the given real system traces. A further step towards intelligent policy design is achieved by analyzing the prediction results of files’ future popularity. The real system trace-based simulations have shown improvements of 2-4 times in terms of data response time in network file system scenario and 24% mean job time reduction in Data Grids compared with conventional cases.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository