2 research outputs found

    Mining High Utility Sequential Patterns from Uncertain Web Access Sequences using the PL-WAP

    Get PDF
    In general, the web access patterns are retrieved from the web access sequence databases using various sequential pattern algorithms such as GSP, WAP, and PLWAP tree. However, these algorithms do not consider sequential data with quantity (internal utility) (e.g., the amount of the time spent by the user on a web page) and quality (external utility) (e.g., the rating of a web page in a website) information. These algorithms also do not work on uncertain sequential items (e.g., purchased products) having probability (0, 1). Factoring in the utility and uncertainty of each sequence item provides more product information that can be beneficial in mining profitable patterns from company’s websites. For example, a customer can purchase a bottle of ink more frequently than a printer but the purchase of a single printer can yield more profit to the business owner than the purchase of multiple bottles of ink. Most existing traditional uncertain sequential pattern algorithms such as U-Apriori, UF-Growth, and U-PLWAP do not include the utility measures. In U-PLWAP, the web sequences are derived from web log data without including the time spent by the user and the web pages are not associated with any rating. By considering these two utilities, sometimes the items with lower existential probability can be more profitable to the website owner. In utility based traditional algorithms, the only algorithm related to both uncertain and high utility is the PHUI-UP algorithm which considers the probability and utility as different entities and the retrieved patterns are not dependent with both due to two different thresholds, and it does not mine uncertain web access database sequences. This thesis proposes the algorithm HUU-PLWAP miner for mining uncertain sequential patterns with internal and external utility information using PLWAP tree approach that cut down on several database scans of level-wise approaches. HUU-PLWAP uses uncertain internal utility values (derived from sequence uncertainty model) and the constant external utility values (predefined) to retrieve the high utility sequential patterns from uncertain web access sequence databases with the help of U-PLWAP methodology. Experiments show that HUU-PLWAP is at least 95% faster than U-PLWAP, and 75% faster than the PHUI-UP algorithm

    Discovering High-Profit Product Feature Groups by mining High Utility Sequential Patterns from Feature-Based Opinions

    Get PDF
    Extracting a group of features together instead of a single feature from the mined opinions, such as “{battery, camera, design} of a smartphone,” may yield higher profit to the manufactures and higher customer satisfaction, and these can be called High Profit Feature Groups (HPFG). The accuracy of Opinion-Feature Extraction can be improved if more complex sequential patterns of customer reviews are learned and included in the user-behavior analysis to obtain relevant frequent feature groups. Existing Opinion-Feature Extraction systems that use Data Mining techniques with some sequences include those referred to in this thesis as Rashid13OFExt, Rana18OFExt, and HPFG19_HU. Rashid13OFExt and Rana18OFExt systems use Sequential Pattern Mining, Association Rule Mining, and Class Sequential Rules to obtain frequent product features and opinion words from reviews. However, these systems do not discover the frequent high profit features considering utility values (internal and external) such as cost, profit, quantity, or other user preferences. HPFG19_HU system uses High Utility Itemset Mining and Aspect-Based Sentiment Analysis to extract High Utility Aspect groups based on feature-opinion sets. It works on transaction databases of itemsets formed using aspects by considering the high utility values (e.g., are more profitable to the seller?) from the extracted frequent patterns from a set of opinion sentences. However, the HPFG19_HU system does not consider the order of occurrences (sequences) of product features formed in customer opinion sentences that help distinguish similar users and identifying more relevant and related high profit product features. This thesis proposes a system called High Profit Sequential Feature Group based on High Utility Sequences (HPSFG_HUS), which is an extension to the HPFG19_HU system. The proposed system combines Feature-Based Opinion Mining and High Utility Sequential Pattern Mining to extract High Profit Feature Groups from product reviews. The input to the proposed system is the product reviews corpus. The output is the High Profit Sequential Feature Groups in sequence databases that identify sequential patterns in the features extracted from opinions by considering the order of occurrences of features in the review. This method improves on existing system\u27s accuracy in extracting relevant frequent feature groups. The results on retailer’s graphs of extracted High Profit Sequential Feature Groups show that the proposed HPSFG_HUS system provides more accurate high feature groups, sales profit, and user satisfaction. Experimental results evaluating execution time, accuracy, precision, and comparison show higher revenue than the tested existing systems
    corecore